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ABSTRACT 

The  National  Institute  of  Standards  and  Technology  (NIST)  has  developed  a standard  reference  form-based 
handprint  recognition  system  for  evaluating  optical  character  recognition.  NIST  is  making  this  recognition  system 
freely  available  to  the  general  public  on  CD-ROM.  This  is  a source  code  distribution  written  primarily  in  C with  two 
additional  utitities  having  FORTRAN  components.  Library  utihties  are  provided  with  the  recognition  system  for  con- 
ducting form  registration,  form  removal,  field  isolation,  field  segmentation,  character  normalization,  feature  extraction, 
character  classification,  and  dicticHiary-based  posti)rocessing.  A host  of  data  structures  and  low-level  utilities  are  also 
provided.  These  utilities  include  the  application  of  spatial  histograms,  Least-Squares  fitting,  spatial  zooming,  con- 
nected components,  Karhimen  Loeve  feature  extraction,  Probabihstic  Neural  Network  classification,  multiple-key 
sorting,  dynamic  string  alignment,  and  dictionary  matching.  The  recognition  system  has  been  successfully  compiled 
and  tested  on  a host  of  UNIX  workstations  including  computers  manufactured  by  Digital  Equipment  Corporation, 
Hewlett  Packard,  IBM,  Silicon  Graphics  Incorporated,  and  Sun  Microsystems.  A CD-ROM  can  be  obtained  free  of 
charge  by  sending  a letter  of  request  to  NIST.  This  report  documents  the  system  in  terms  of  its  installation,  organiza- 
tion, and  functionality. 


7.  INTRODUCTION 

The  National  Institute  of  Standards  and  Technology  (NIST),  has  developed  a standard  reference  form-based 
handprint  recognition  system  for  evaluating  optical  character  recognition  (OCR).  NIST  is  makmg  this  recognition  sys- 
tem freely  available  to  the  general  public  on  CD-ROM.  This  report  documents  the  system  in  terms  of  its  installation, 
organization,  and  functionality.  The  standard  reference  recognition  system  is  designed  to  run  on  UNIX  workstations 
and  has  been  successfully  compiled  and  tested  on  a Digital  Equipment  Corporation  (DEC)  Alpha,  Hewlett  Packard 
(HP)  Model  712/80,  ffiM  RS6000,  Silicon  Graphics  Incorporated  (SGI)  Indigo  2,  SGI  Onix,  SGI  Challenge,  Sun 
Microsystems  (Sun)  IPC,  Sun  SPARCstation  2,  Sun  4/470,  and  a Sun  SPARCstation  10.*  The  standard  reference  rec- 
ognition system  runs  on  computers  with  as  tittle  as  8 Megabytes  of  memory,  but  this  is  not  recommended  as  the  system 
is  computer  resource  intense. 

The  source  code  for  the  main  system,  hsfsys,  is  written  in  C (traditional  K&R  not  ANSI)  and  is  organized  into 
11  libraries.  In  all,  there  are  approximately  19,000  tines  of  code  supporting  more  than  550  subroutines.  Source  code  is 
provided  for  form  registration,  form  removal,  field  isolation,  field  segmentation,  character  normalization,  feature 
extraction,  character  classification,  and  dictionary-based  posq)rocessmg.  A host  of  data  stmctures  and  low-level  utili- 
ties are  also  provided.  These  utitities  include  the  application  of  CCll  1 Group  4 decompression^’^,  IHead  file  manip- 
ulation^’'^, spatial  histograms,  Least-Squares  fitting^,  spatial  zooming,  connected  components,  Karhunen  Loeve  feature 
extraction^.  Probabilistic  Neural  Network  classification'',  multiple-key  sorting,  Levenstein  distance  dynamic  string 
alignment^,  and  dictionary  matching^. 

Two  other  programs  are  provided  that  generate  data  files  used  by  the  recognition  system.  The  first  program, 
mislevt,  computes  a covariance  matrix  and  generates  eigenvectors  frcm  a sample  of  segmented  character  images.  The 
second  program,  mis2pat,  produces  prototype  feature  vectors  for  neural  network  training  and  classification. 

Specific  hardware  and  software  products  identified  in  this  paper  were  used  in  order  to  adequately  support  the  development  of  the  technology 
described  in  this  document  In  no  case  does  such  identification  imply  recommendation  or  endorsement  by  the  National  Institute  of  Standards  and 
Technology,  nor  does  it  imply  that  the  equipment  identified  is  necessarily  the  best  available  for  the  purpose. 
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These  feature  vectors  are  computed  using  segmented  character  images  and  eigenvectors.  Unlike  the  recognition  system 
which  is  written  entirely  in  C,  these  two  programs  contain  FORTRAN  77  components.  To  support  these  programs,  a 
training  set  of  168,365  segmented  and  labeled  character  images  is  provided  in  the  distribution.  About  1000  writers  con- 
tributed to  this  training  set. 

A GD-ROM  distribution  of  this  standard  reference  system  can  be  obtained  free  of  charge  by  sending  a letter 
of  request  to  Michael  D.  Garris  at  the  address  above.  Requests  made  by  electronic  mail  will  not  be  accepted.  The  letter, 
preferably  on  company  letterhead,  should  identify  the  requesting  organization  ot  individuals.  This  system  or  any  por- 
tion of  this  system  may  be  used  without  restrictions  because  it  was  created  with  U.S.  government  funding.  Redistribu- 
tion of  this  standard  reference  system  is  strongly  discouraged  as  any  subsequent  corrections  or  updates  will  be  sent  to 
registered  recipients  only.  This  software  was  produced  by  NIST,  an  agency  of  the  U.S.  government,  and  by  statute  is 
not  subject  to  copyright  in  the  United  States.  Recipients  of  this  software  assume  all  responsibilities  associated  with  its 
operation,  modification,  and  maintenance. 

H^sys  processes  the  Handwriting  Sample  Forms  distributed  with  NIST  Special  Database  1 (SD 1)  and  NIST 
Special  Database  3 (SD3)^^  ATST  Special  Database  I contains  2,100  full  page  images  of  handwriting  samples  printed 
by  2,100  different  writers  geographically  distributed  across  the  United  States  with  a sampling  roughly  proportional  to 
population  density.  The  writers  used  in  this  collection  were  permanent  Census  field  representatives  experienced  in  fill- 
ing out  forms.  NIST  Special  Database  3,  a CD-ROM  containing  3 13,389  segmented  and  labeled  character  images,  was 
extracted  from  SDl  and  contains  the  original  HSF  forms  as  well.  The  forms  were  scanned  at  12  pixels  per  millimeter 
(300  dots  per  inch  - dpi)  binary  and  contain  entry  fields  demarcated  by  boxes,  one  box  for  the  entire  field  value. 

Each  of  the  2,100  forms  in  SDl  is  an  image  of  a structured  form  filled  in  by  a unique  writer.  A single  field 
template  specifying  the  number  of  entry  fields,  their  size  and  location,  was  used.  An  image  of  a completed  form  is 
shown  in  Figure  1.  The  form  is  comprised  of  3 identification  boxes,  28  digit  boxes  of  varying  length,  a randomly 
ordered  lower  case  alphabet,  a randomly  ordered  upper  case  alphabet,  and  a handprinted  text  paragraph  containing  the 
Preamble  to  the  U.S,  Constitution.  Notice  that  the  first  field,  the  name  field,  has  been  covered  with  black  pixels  making 
the  writer  of  each  form  anonymous.  Hsfsys  has  been  designed  to  read  all  but  the  first  3 identification  fields. 

There  are  10  HSF  forms  provided  with  this  distribution.  In  addition,  there  is  one  blank  form  provided  both  in 
Latex  and  PostScript  formats  that  can  be  printed,  filled  in,  scanned,  and  then  recognized  by  hsfsys.  For  additional  HSF 
forms,  SDl  and  SD3  may  be  purchased  by  contacting: 

Standard  Reference  Data 

NIST 

221/A323 

Gaithersburg,  MD  20899 
voice:  (301)  975-2208 
FAX:  (301)  926-0416 
email:  srdata@enhjiistgov 

Section  8 gives  installation  instructions  and  discusses  the  organization,  compilation,  and  invocation  of  the  hsf- 
sys system.  Section  9 documents  the  functionality  of  the  provided  software.  Section  10  presents  some  performance  and 
timing  results,  and  Section  1 1 contains  a few  final  comments.  This  report  also  has  three  Appendices.  Appendix  A doc- 
uments the  invocation  and  functionality  of  mislevt,  while  Appendix  B documents  the  invocation  and  functionality  of 
mis2pat.  Appendix  C compares  the  NIST  standard  reference  recognition  system  to  the  results  reported  from  the  Second 
Clensus  Optical  Character  Recognition  System  Conference. 
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HANDWRrriNG  SAMPLE  FORM 


DATE 


7- 


CITY 


STATE  ZIP 


f)-}lehcfcLie. 


mx  >49^0 1 


Thk  saiX4>le  ci  handwriting  is  being  collected  for  use  in  testing  cOTiputer  recognition  of  hand  printed  numbers 
and  letters.  Please  print  the  following  characters  in  the  boxes  that  appear  below. 


0123456789 0123456789 0123456789 


1 oia.3dSii  789 

89 

0 \a3d.sir7  s°i 

14 

542 

3309 

54308 

467077 

M 

33o9 

^4308" 

V6  7<»77 

169 


1 M 

9588 

9sss 

1293  62346  857238 


/393 

S n5*  7 S3  8 

71711  034264 74 


7n/i 

7V 

12 


274 
SI'?  4 


29279 

286106 

85 

505 

3597 

A9A79 

Joi, 

■S'OS 

3S97 

485969 

30 

063 

0589 

18160 

3c 

063 

o-Sd? 

ISiteO 

ivmgticeyaskhouwdpnbxqlfjr 

cis  kh^u  tod  ph  h 


XZQURPCAEFBTVDOKILJYSHGWMN 

KZ6lU'RVCff£'FBTVDokl-tl-^ySM6U)mN 


Please  print  the  following  text  in  the  box  below; 

We,  the  People  of  the  United  States,  in  order  to  form  a more  perfect  Union,  establish  Justice,  insure  domestic 
Tranquility,  provide  for  the  common  Defense,  promote  the  general  Welfare,  and  secure  the  Blessings  of  Liberty  to 
onradves  and  our  posterity,  do  ordain  and  establish  this  CONSTITUTION  for  the  United  States  of  America. 


C0«.  The  People  of  t^he  UlhlteJ  States^  order  to  & more 

perfect  UhioK^  JusTt-e,^  Insure  domesfx- 

, prox/tc/e  for  promote  T ^ 

qen-ero.1  uJeJ9^r*.^  and  Secure.  +K«  'Bless  mjs  ef-  l.,'6crty  to 
OurSe/ueS  dnd  OUr  fzcxn  post  er,ty  ^ do  Ordain  b liiK 

this  CaNSTiTUTIoN  for  the  United  Stotes  oF  /^merieo. 


Figure  1.  Completed  Handwriting  Sample  Form  from  SDL 
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8.  INSTALLATION  INSTRUCTIONS 


8.1  Installing  from  CD-ROM 

Hsfsys  is  distributed  on  CD-ROM  in  the  ISO-9660  data  format.  This  format  is  widely  supported  on  UNIX 
workstations,  DOS  personal  computers,  and  VMS  computers.  Therefore,  the  distribution  can  be  read  and  downloaded 
onto  these  various  platforms.  Keep  in  mind  that  the  source  code  has  been  developed  to  run  on  UNIX  workstations.  It 
is  the  responsibility  of  the  recipient  to  modify  the  distribution  source  code  so  that  it  will  execute  on  their  particular 
computer  architectures  and  operating  systems. 

Upon  receiving  the  CD-ROM,  load  the  disc  onto  your  computer  usmg  a CD-ROM  drive  equipped  with  a 
device  driver  that  supports  the  ISO-9660  data  format.  You  may  need  to  be  assisted  by  your  system  administrator  as 
mounting  a file  system  usually  requires  root  permission.  Then  recursively  copy  the  disc  contents  into  a read-writable 
file  system.  The  entire  distribution  requires  approximately  150  Megabytes  upon  compilation.  The  top-level  distribution 
directory  doc  contains  just  under  72  Megabytes  of  PostScript  reference  documents.  These  files  are  not  necessary  to 
compile  and  run  the  standard  reference  recogniticHi  system.  Therefore,  they  do  not  have  to  be  copied  off  of  the  CD- 
ROM  if  disk  space  is  limited  on  your  computer,  in  which  case,  the  entire  distribution  requires  approximately  80  Mega- 
bytes upon  compUatian.  For  example,  the  CD-ROM  can  be  mounted  and  the  entire  distribution  copied  with  the  fol- 
lowing UNIX  commands  on  a Sun  SPARCstation: 

# mount  -V  -t  hsfs  -o  ro  /dev/srO  /cdrom 

# mkdir  Aasr/local/hsfsys 

# cp  -r  /cdrom  /usrAocal/hsfsys 

# umount  -V  /cdrom 

where  IdevIsrO  is  the  device  file  associated  with  the  CD-ROM  drive, ! cdrom  represents  the  directory  to  which  the  CD- 
ROM  is  mounted,  and  lusri local! hsfsys  is  the  directory  into  which  the  distribution  is  copied.  If  the  distribution  is 
installed  as  the  root  user,  it  may  be  desirable  to  change  ownership  of  the  installation  directory  using  the  chown  com- 
mand. CD-ROM  is  a read-only  media,  so  copied  directories  and  files  are  likely  to  retain  read-only  permissions.  The 
file  permissions  should  be  changed  using  the  chmod  command  so  that  directories  and  scripts  within  the  copied  distri- 
bution are  read,  write,  and  executable.  AU  catalog  files  should  be  changed  to  be  read-writable.  In  general,  source  code 
files  can  remain  read-only.  Section  8.2  identifies  the  location  of  these  vmous  file  types  within  the  distribution.  Specif- 
ically, the  file  binicatalog.csh  must  be  assigned  executable  permission,  and  files  with  the  name  catalog.txt  imder  the 
top-level  src  directory  must  be  assigned  read-writable  permission. 

By  default,  the  distribution  assumes  the  installation  directory  to  be  lusrilocallhsfsys.  If  this  directory  is  used, 
the  software  can  be  compiled  directly  without  any  path  name  modifications.  To  minimize  installation  complexity,  the 
directory  lusrilocallhsfsys  should  be  used  if  at  all  possible.  If  insufficient  space  exists  in  your  lusri  local  file  system,  the 
installation  can  be  copied  elsewhere  and  referenced  through  a symbolic  link  from  lusrilocallhsfsys. 

If  you  decide  to  install  this  distribution  in  some  other  directory,  then  editing  a number  of  source  code  files  will 
be  necessary  prior  to  compiling  the  programs.  Edit  the  line  “PROJDIR  = /usr/local/hsfsys”  in  the  file  makefile. mak  in 
the  top-level  installation  directory,  replacing  lusrilocallhsfsys  with  the  full  path  name  of  the  installation  directory  you 
have  chosen.  Likewise  replace  all  references  to  lusrilocallhsfsys  in  the  files  histgram.h,  hsfsys. h,  and  invbytes.h  found 
in  the  top-level  directory  include.  Remember,  to  make  these  file  modifications,  the  permission  of  these  files  will  have 
to  be  changed  first.  Once  these  edits  are  made,  follow  the  instnictions  in  Section  8.3  for  compilation. 
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<installation  directory> 


bin  data  diet  doc  include  lib  lut  sre  tmplt  train  weights 
Figure  2.  The  top-level  directory  structure  in  the  software  distribution. 

The  top-level  directories  in  this  distribution  are  shown  in  Figure  2.  The  first  directory  bin  holds  all  distributed 
shell  scripts  and  locally  compiled  programs  that  support  the  recognition  system.  The  full  path  name  to  this  directory 
should  be  added  to  your  environment’s  search  path  prior  to  compilation.  Upon  successful  compilation,  the  programs 
hrfsys,  mislevt,  and  mis2pat  are  installed  in  the  top-level  bin  directory.  The  invocation  of  hsfsys  is  discussed  in  Section 
8.4,  while  the  invocation  of  mis2evt  and  mis2pat  are  discussed  in  the  appendices.  The  directory  bin  also  contains  the 
file  catalog. esh  that  must  be  assigned  executable  permission.  This  file  is  a C-shell  script  that  is  used  to  automatically 
catalog  programs  and  library  routines,  which  is  discussed  in  Section  8.3. 

The  directory  data  contains  10  subdirectories  f0000_14  through  J0009_06  containing  completed  forms  firom 
SDL  Each  subdirectory  holds  the  form  image  in  an  IHead  format^’'*  file  with  extension  pet,  a reference  file  with  exten- 
sion re/listing  the  values  the  writer  was  instructed  to  enter  in  each  field,  and  two  system  output  files  generated  by  hsfsys 
running  at  NIST.^^  The  first  output  file,  a hypothesis  file,  has  an  extension  of  nhy  and  fists  the  system’s  recognized 
values  for  each  field.  The  second  ouQDut  file,  a confidence  file,  has  an  extension  of  nco  and  fists  the  corresponding  con- 
fidence values  for  each  character  classification  reported  in  the  hypothesis  file.  The  format  of  these  files  is  presented  in 
Section  9.2.3. 

The  directory  diet  contains  the  dictionary  file  const. mfs  listing  in  alphabetical  order  all  the  words  present  in 
the  Preamble  to  the  U.S.  Constitution.  The  directory  include  holds  all  the  header  files  that  contain  constants  and  data 
structure  definitions  required  by  the  system  source  code.  The  directory  lib  holds  all  locally  compiled  object  code  librar- 
ies used  in  compiling  the  distribution  programs.  The  directory  lut  contains  two  lookup  tables,  bitcount.lut  and 
inv_byte.lut.  The  file  bitcount.lut  contains  a lookup  table  used  to  determine  the  number  of  black  pixels  in  given  byte 
of  binary  image  data.  The  file  invjoyte.lut  contains  a lookup  table  used  to  reverse  the  bit  pattern  within  a given  byte  of 
data.  The  directory  sre  contains  all  the  source  code  files  (excluding  the  header  files  in  top-level  directory  include)  pro- 
vided with  the  recognition  system  distribution.  The  organization  of  sre  subdirectories  is  discussed  Section  8.2.1. 

This  software  distribution  provides  a number  of  PostScript  reference  documents  contained  in  the  top-level 
directory  doc.  The  PostScript  file  for  this  specific  document  is  hsfsys.ps.  All  the  other  files  in  this  directory  are  papers 
and  reports  pubhshed  by  NIST  that  are  referenced  within  this  document.  These  files  have  been  assigned  names  accord- 
ing to  their  reference  numbers  fisted  on  pages  58  and  59.  All  but  three  files  in  doc  are  PostScript  documents  ending 
with  the  extension  ps.  The  files  ref_05.tar  and  ref_27.tar  were  created  with  the  UNIX  tar  command,  and  they  contain 
multiple  PostScript  files.  For  example,  the  PostScript  files  contained  in  the  file  ref_05.tar  can  be  extracted  into  the  cur- 
rent working  directory  using  the  foUowing  command: 

# tar  xvf  ref_05.tar 

The  files  ref_12_l.ps  and  ref_12_2.z  contain  the  Second  Census  Optical  Character  Recognition  Systems  Conference 
report.  The  first  part  is  a PostScript  file,  whereas  the  second  part  is  a UNIX  compressed  tar  file.  To  extract  the  Post- 
Script files  archived  in  ref_12_2.z,  use  the  foUowing  the  command.  Warning,  extracting  these  files  requires  a large 
amount  of  disk  space. 

# zcat  < ref_12_2.z  I tar  xvf  - 
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The  directory  tmplt  contains  files  pertaining  to  HSF  forms.  A blank  HSF  form  is  provided  in  both  Latex  and 
PostScript  formats.  The  Latex  file  hsfj).tex  or  die  PostScript  file  hsf_0.ps  can  be  printed,  filled  in,  scanned  at  12  pixels 
per  millimeter  (300  dpi),  and  then  recognized  by  hsfsys.  The  points  used  to  register  an  HSF  form  are  stored  in  the  file 
hsfreg.pts,  and  the  points  defining  the  location  of  each  HSF  entry  field  are  stored  in  the  file  hsftmplt.pts.  A registered 
blank  HSF  fonn  image  from  which  these  points  have  been  extracted  is  stored  in  the  file  hsftmplt.pct,  and  a dilated  ver- 
sion of  this  form  used  in  form  removal  is  stored  in  the  file  hsftmplt.d4. 

A large  sample  of  training  data  is  provided  in  the  top-level  directory  train.  As  mentioned  earlier,  there  are 
168,365  segmented  and  labeled  handprint  characters  contained  in  this  directory.  In  aU  there  are  119,740  images  of 
handprint  digits,  24,205  lower  case  letters,  and  24,420  upper  case  letters.  The  handprint  fi:om  about  1000  different  writ- 
ers are  represented  in  this  set  of  character  images,  which  is  divided  among  two  subdirectories  tdl  and  tdS.  These  two 
subdirectories  are  further  subdivided  into  groups  of  25  writers.  The  images  of  segmented  characters  are  stored  in  the 
Multiple  Image  Set  CNHS)  file  format,  which  was  used  to  distribute  character  images  in  SD3.^^  Each  MS  file  ends  with 
the  extension  mis.  Those  files  beginning  with  d contain  data  related  to  handprint  digits,  files  beginning  with  / corre- 
spond to  lower  case  letters,  and  files  beginning  with  u correspond  to  upper  case  letters.  The  four  digit  number  embed- 
ded in  each  file  name  is  an  index  identifying  the  writer.  For  each  MIS  file  in  the  training  set,  there  is  an  associated 
classification  file  containing  the  identity  of  each  character  contained  in  the  MIS  file.  These  classification  files  end  with 
the  extension  els.  The  first  line  in  a classification  file  contains  the  number  of  character  images  contained  in  the  corre- 
sponding MS  file.  All  subsequent  lines  store  the  identity  (in  hexadecimal  ASCH  representation)  of  each  successive 
character  image.  MS  files  containing  images  of  lower  case  letters  have  a second  classification  file  associated  with  them 
that  ends  with  the  extension  cus.  These  files  store  the  identity  of  each  lower  case  letter  as  their  corresponding  upper 
case  equivalent.  For  example,  an  image  of  a the  lower  case  character  k is  stored  in  a els  file  as  6b,  whereas  it  is  stored 
in  a eus  file  as  4b  (the  hexadecimal  ASCH  representation  for  the  upper  case  character  K).  The  labelling  of  lower  case 
letters  as  upper  case  is  used  when  classifying  characters  in  the  Constitution  box. 

The  last  top-level  directory  weights  holds  the  files  associated  with  feature  extraction  and  character  classifica- 
tion. The  files  with  the  extension  evt  contain  eigenvector  basis  functions  used  to  compute  Karhunen  Loeve  coefficients. 
The  pattern  (or  prototype)  files  with  the  extension  pat  coatain  training  sets  of  Karhunen  Loeve  prototype  vectors  and 
a search  tree  used  by  the  Probabilistic  Neural  Network.  Another  type  of  file  in  this  directory  contains  class-based 
median  vectors  computed  from  the  prototypes  stored  in  the  corresponding  pat  file.  Median  vector  files  end  with  the 
extension  med. 

The  evt  files  were  computed  using  mislevt  discussed  in  Appendix  A,  and  both  the  pat  and  med  files  were  com- 
puted using  mis2pat  discussed  in  Appendix  B.  The  files  tdlSJ.evt,  tdlSJ.pat,  and  tdl3J.med  were  computed  from 
24,205  lower  case  images  in  both  trainltdl  and  trainltdS  and  are  used  to  compute  features  and  classify  lower  case  char- 
acters. The  files  tdl3_u.evt,  tdl3_u.pat,  and  tdl3_u.med  were  computed  from  24,420  upper  case  images  in  both  train/ 
tdl  and  train! td3  and  are  used  to  compute  features  and  classify  upper  case  characters.  The  files  tdl3_ul.evt,  tdl3jil.- 
pat,  and  tdl3_ul.med  were  computed  from  48,625  images  of  both  lower  and  upper  case  in  train! tdl  and  train! td3  and 
are  used  to  compute  features  and  classify  characters  for  lower  and  upper  case  combined.  The  files  td3ji.evt,  td3ji.pat, 
and  td3_d.med  were  computed  from  61,094  images  of  digits  in  train! td3  and  are  used  to  compute  features  and  classify 
segmented  images  of  digits.  Two  additional  pairs  of  evt,  pat,  and  med  files  are  provided  so  that  computers  with  limited 
memory  of  at  least  8 Megabytes  are  able  to  execute  all  cations  of  the  recognition  system.  The  files  td3_ul_s.evt, 
td3_ul_s.pat,  and  td3_ul_s.med  were  computed  from  24,684  images  of  both  lower  and  upper  case  only  in  train! td3, 
whereas  td3_d_s.evt,  td3_d_s.pat,  and  td3ji_s.med  were  computed  from  21,293  images  of  digits  in  train! td3.  In  gen- 
eral, recognition  accuracy  decreases  as  the  number  of  prototypes  is  decreased.  Therefore,  the  larger  pattern  files  should 
be  used  when  possible. 


22,1  Source  Code  Subdirectory 

The  OTganization  of  subdirectories  under  the  top-level  directory  sre  is  shown  in  Figure  3.  The  subdirectory 
sre!bin  contains  all  program  main  routines.  Included  in  this  directory  is  a eatalog.txt  file  providing  a short  description 
of  each  program  provided  in  this  distribution.  In  this  distribution  there  are  three  programs  and  therefore  three  directo- 
ries sre /bin! hsfsys,  sre!bin!mis2evt,  and  sre!binJmis2pat.  The  first  directory  contains  the  recognition  system’s  main  rou- 
tine in  the  file  hsfsys.e  and  a number  of  different  architecture-dependent  compilation  scripts  used  by  the  UNIX  make 
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utility.  The  use  of  the  make  utility  is  discussed  in  Section  2.3.  The  second  directory  contains  the  main  routine  and  com- 
pUaticm  scripts  for  the  program  mislevt.  The  third  directory  contains  the  main  routine  and  compilation  scripts  for  the 
program  mis2pat.  The  programs  mislevt  and  mis2pat  have  FORTRAN  components,  therefore  their  corresponding 
source  directories  also  contain  FORTRAN  source  code  files  ending  with  the  extension/.  These  two  directories,  src/ 
binlmis2evt  and  srclbinlmis2pat,  contain  the  only  FORTRAN  files  m the  entire  distribution.  If  your  computer  does  not 
have  a FORTRAN  compiler,  you  won’t  be  able  to  compile  these  two  supporting  programs.  However,  all  you  need  is  a 
C compiler  to  be  able  to  compile  all  the  libraries  in  srd lib  and  run  the  recognition  system  hsfsys.  Upon  successful  com- 
pilation, the  directories  under  srcibin  will  contain  compiled  object  files  and  a development  copy  of  each  program’s  exe- 
cutable file.  Production  copies  of  these  programs  are  automatically  installed  in  the  top-level  directory  bin. 


src 


bin 


hsfsys  mis2evt  mis2pat 


diet 


hsf 


unage 


mis 


phrase 


util 


fet  ihead  mfs  nn  stats 
Figure  3.  Directory  hierarchy  under  the  top-level  directory  src. 


The  subdirectory  src/ lib  contams  the  source  code  for  all  the  recognition  system’s  supporting  libraries.  This 
distribution  has  11  libraries  each  represented  as  a subdirectory  under  srd  lib.  Each  Ubrary  contains  a suite  of  C source 
code  files  designated  with  the  extension  c and  a set  of  different  architecture-dependent  compilation  scripts  designated 
with  the  root  file  name  makefile.  Also  included  in  each  library  subdirectory  is  a catalog.txt  file  providing  a short 
description  of  each  routine  contained  m that  specific  library.  Upon  successful  compilation,  each  library  subdirectory 
under  srd  lib  will  contain  compiled  object  files  (with  file  extension  o)  and  a development  copy  of  each  library’s  archive 
file  (with  file  extension  a).  Production  copies  of  the  library  archive  files  are  automatically  mstalled  in  the  top-level 
directory  /ii>.The  diet  library  holds  routines  responsible  for  dictionary  manipulation  and  matching.  The  fet  library  is 
responsible  for  manipulating  Feature  (FET)  structures  and  files.  The  /25/library  is  responsible  for  form  processing  with 
respect  to  HSF  forms.  The  ihead  library  is  responsible  for  mampulating  IHead  structures  and  files.  The  image  library 
contains  general  image  manipulation  and  processing  routines.  The  mfs  library  is  responsible  for  manipulating  Multiple 
Feature  Set  (MFS)  structures  and  files.  The  mis  library  is  responsible  for  manipulating  Multiple  Image  Set  (MIS)  struc- 
tures and  files.  The  nn  library  contains  general  feature  extraction  and  neural  network  routines.  The  phrase  library  holds 
routines  responsible  for  processing  the  segmented  text  from  a multiple-line  field  like  the  Constitution  box  on  HSF 
forms.  The  stats  library  contains  general  statistics  routines.  Lastly,  the  util  library  contains  a collection  of  miscella- 
neous routines.  These  various  structure  definitions  and  file  formats  are  defined  in  Section  3. 


23  Automated  Compilation  Utility 

Before  compiling  the  recognition  system  distribution,  the  full  path  name  to  the  top-level  directory  bin  in  the 
installation  directory  must  be  added  to  your  shell’s  executable  search  path.  For  example,  if  the  distribution  is  installed 
mlusri local! hsfsys,  your  search  path  should  be  augmented  to  include  lusri local! hsfsys! bin.  It  may  also  be  necessary  to 
edit  the  path  names  contained  in  a niunber  of  files  as  discussed  in  Section  2. 1 . 

Source  code  compilation  of  the  recognition  system  distribution  is  controlled  through  a system  of  hierarchical 
compilation  scripts  used  by  the  UNIX  make  utility.  Each  one  of  these  scripts  is  contained  in  a file  with  the  root  name 
makefile.  This  automated  compilation  system  is  responsible  for  installing  all  architecture-dependent  source  code  files 
and  compilation  scripts,  clearing  aU  compiled  object  files  and  development  copies  of  libraries  and  programs,  automat- 
ically generating  source  code  dependency  lists,  and  installing  production  versions  of  libraries  and  programs.  One 
makefile. mak  file  exists  in  the  top-level  installation  directory,  and  one  makefile.mak  file  exists  in  each  of  the  src,  src! 
bin,  and  src! lib  subdirectories.  These  compilation  scripts  are  architecture  independent  and  contain  Bourne  shell  com- 
mands. 
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Man. 

Model 

O.S. 

# Proc.’' 

RAM 

arch 

DEC 

Alpha 

OSF/1  V1.3 

1 

32  Mb 

osf 

HP 

Model  712/80 

HP-UX  9.03 

1 

64  Mb 

hp 

IBM 

RS6000 

ADC  3.2.5 

1 

128  Mb 

aix 

SGI 

Challenge  (ffl9) 

IRIX  5.2 

8 

512  Mb 

sgi 

SGI 

Indigo  2 0T22) 

IRIX4.0.5H 

1 

128  Mb 

sgi 

SGI 

Onyx  (IP19) 

IRIX5.1.1.3 

4 

512  Mb 

sgi 

Sim 

SPARCserver  4/470 

SunOS  4.1.1 

1 

32  Mb 

sun 

Sun 

SPARCstation  IPC 

SunOS  4.1.2 

1 

8Mb 

sun 

Sun 

SPARCstation  2 
(Weitek  80MHz  CPU) 

SunOS  4.1.3 

1 

64  Mb 

sun 

Sun 

SPARCstation  10 

SunOS  4.1.3 

1 

32  Mb 

sun 

Sun 

SPARCstation  10 

SunOS  5.2  (Solaris) 

2 

128  Mb 

sol 

Figure  4.  Table  of  different  computers  on  which  the  standard  reference  recognition  system  has  been  successfully  ported 
and  tested,  and  for  which  architecture-dependent  files  are  provided  in  the  distribution. 

*A11  computers,  including  those  with  multiple  processors,  were  compiled  and  tested  serially. 

There  are  a number  of  architecture-dependent  compilation  scripts  found  within  each  program  directory  under 
srcibin  and  each  hbrary  directory  under  srcilib.  This  standard  reference  recognition  system  has  been  successfully 
ported  and  tested  on  computers  running  various  versions  and  releases  of  the  UNIX  operating  system.  These  machines 
are  listed  in  the  table  shown  in  Figure  4.  The  table  from  left  to  right  fists  each  computer’s  manufacturer,  model,  oper- 
ating system,  number  of  processors,  amount  of  main  memory,  and  an  architecture  identifier.  There  are  numerous  dif- 
ferences between  these  different  computers  and  their  operating  systems.  Common  discrepancies  include  differences  in 
the  syntax  of  compilation  scripts  and  built  in  macro  definitions;  some  operating  systems  requfie  manually  building  the 
symbol  table  in  archived  library  files,  while  other  systems  update  these  symbol  tables  automatically;  every  one  of  these 
operating  systems  has  an  install  command,  but  each  seems  to  require  its  own  special  set  of  arguments;  finally,  each 
manufacturer’s  compilers  have  different  options  and  switches  for  ccmtrolling  language  syntax  and  optimization.  To 
account  for  these  variations,  there  are  architecture-dependent  compilation  scripts  ^ded  for  each  program  and 
library  in  the  distribution.  These  compilation  scripts  have  the  root  file  name  makej  id  end  with  an  extension  iden- 
tifying their  corresponding  architecture.  The  right  colurrm  in  Figure  4 fists  the  set  ot  extensions  used  to  identity  archi- 
tecture groups  for  the  computers  and  operating  systems  tested. 

There  are  also  a number  of  architecture  dependent  source  code  files  provided  in  the  distribution.  These  files 
share  the  same  root  file  name  and  end  with  an  architecture-identifying  extension  consistent  with  those  used  for  com- 
pilation scripts.  Architecture-dependent  source  code  files  exist  in  mis2evt  and  mis2pat  to  support  the  caUing  of  FOR- 
TRAN subroutines  from  C.  Some  compilers  require  the  C-side  caller  tc  i ^lude  an  underscore  after  the  FORTRAN 
subroutine  name,  whereas  other  compilers  require  no  underscore  be  pref>  There  are  other  architecture-dependent 
source  code  files  provided  to  support  DEC-fike  machines  that  use  a different  byte  order  to  represent  unformatted  binary 
data.  All  unformatted  binary  data  files  provided  in  this  distribution  were  created  on  machines  using  the  Motorola-based 
byte  order.  When  these  files  are  read  by  a machine  using  an  Intel-based  byte  order,  the  bytes  must  be  swapped  before 
the  data  can  be  used.  The  overhead  of  swapping  the  bytes  in  these  data  files  can  be  avoided  by  regenerating  them  with 
locally  compiled  versions  of  mis2evt  and  mis2pat  on  your  computer  according  to  the  instructions  provided  in  the 
appendices. 

It  was  stated  earlier  that  the  automated  compilation  system  is  responsible  for  installing  aU  architecture-depen- 
dent source  code  files  and  compilation  scripts,  clearing  all  compiled  object  files  and  development  copies  of  libraries 
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and  programs,  automatically  generating  source  code  dependency  lists,  and  installing  production  versions  of  libraries 
and  programs.  These  tasks  are  initiated  by  invoking  the  make  command  at  the  top-level  installation  directory.  All  sub- 
sequent lower-level  makefile. mak  scripts  are  invoked  automatically  in  a prescribed  order,  and  the  1 9,000  lines  of  source 
code  are  automatically  maintained  and  object  files  kept  up  to  date.  The  make  command  can  be  invoked  from  the  loca- 
tion of  any  lower-level  makefile. mak  file  and  thereby  isolate  specific  portions  of  the  source  code  for  recompilation. 
However,  the  details  of  doing  this  are  slightly  involved  and  left  to  the  installer  to  pursue  on  his  own. 

The  standard  reference  recognition  system  in  srcibinihsfsys,  and  its  supporting  libraries  under  srcUib,  are  all 
coded  in  C.  Two  other  utilities  located  in  srcibinimislevt  and  srdbinimisipat  have  FORTRAN  components  that  require 
a FORTRAN  77  compiler.  The  software  distributicm  has  been  organized  so  that  you  can  compile  the  recognition  system 
even  though  your  computer  may  not  have  an  installed  FORTRAN  compiler.  To  remove  mis2evt  and  mis2pat  from  the 
hierarchical  ccHnpilation,  edit  the  file  srd bin! makefile. mak,  removing  mis2evt  and  mis2pat  from  the  assignment  to  the 
variable  “SUBS”. 


Assuming  the  installation  directory  is  lusrllocallhsfsys,  the  following  steps  are  required  to  compile  the  distri- 
bution for  the  first  time  on  your  computer: 

# cd  /usr/locaVhsfsys 

# make  -f  makefile.mak  instarch  JNSTARCH.=<arch> 

# make  -f  makefile  jnak  bare 

# make  -f  makefile  jnak  depend 

# make  -f  makefile  jnak  install 

The  first  make  invocation  uses  the  instarch  option  to  install  architecture-dependent  files  required  to  support 
the  compilation  and  execution  of  the  distribution’s  programs  and  hbraries.  The  actual  architecture  is  defined  by  replac- 
ing the  argument  <arch>  with  one  of  the  extensions  listed  in  Figure  4.  For  example,  “INSTARCH=sun”  must  be  used 
to  compile  the  distribution  on  computers  running  SunOS  4.1.?.  If  you  are  installmg  this  software  on  a machme  not 
listed  in  Figure  4,  you  first  need  to  determine  which  set  of  architecture-dependent  files  is  most  similar  to  those  required 
by  your  particular  computer.  Invoke  make  using  the  instarch  option  with  INSTARCH  set  to  the  closest  known  archi- 
tecture. Then,  edit  the  resulting  makefile.mak  files  in  the  subdirectories  under  srd  bin  and  srd  lib  according  to  the 
requirements  of  your  machine.  One  other  hint,  if  you  are  compiling  on  a Solaris  (SunOS  5 .?)  machine  using  the  parallel 
make  utility,  you  may  have  to  add  a “-R”  option  prior  to  the  “-f  ’ option  for  each  of  the  make  invocations. 

The  bare  option  causes  the  compilation  scripts  to  remove  all  temporary,  backup,  core,  and  object  files  from 
the  program  directories  in  srd  bin  and  the  library  directories  in  srd  lib.  The  depend  option  causes  the  compilation 
scripts  to  automatically  generate  source  code  dependency  lists  and  modify  the  makefile.mak  files  within  the  program 
and  library  directories.  Your  C compiler  may  not  have  this  capability,  in  which  case  you  may  want  to  generate  the 
dependency  lists  by  hand.  The  install  option  builds  source  code  dependency  lists  as  needed,  compiles  all  program  and 
library  source  code  files,  and  installs  compiled  libraries  and  programs  into  their  corresponding  production  directories. 
Compiled  libraries  are  installed  in  the  installation  top-level  directory  lib.  Compiled  programs  are  installed  in  the  instal- 
lation top-level  directory  bin. 

One  other  capability,  the  automatic  generation  of  catalog  files,  has  been  incorporated  into  the  hierarchical 
compilation  scripts.  A formatted  comment  header  is  included  at  the  top  of  every  program  and  library  source  code  file 
in  the  recognition  system  distribution.  When  the  install  option  is  used,  the  low-level  makefile.mak  files  invoke  the  C- 
shell  script  bin! catalog. csh.  The  script  catalog.csh  extracts  all  source  code  headers  associated  with  all  the  programs  or 
a specific  library  in  the  distribution  and  compiles  a catalog.txt  file.  A catalog.txt  file  exists  in  the  subdirectory  srcibin, 
and  one  catalog.txt  file  exists  in  each  of  the  library  directories  in  srd  lib.  This  provides  a convenient  and  quick  reference 
to  the  source  code  provided  in  the  distribution. 
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8.4  System  Invocation 

This  section  describes  how  the  recognition  system  program  hsfsys  is  invoked  and  controlled  from  the  com- 
mand line.  Once  you  have  successfully  compiled  the  software  distribution  on  your  computer,  the  recognition  system 
can  be  tested  on  the  HSF  forms  provided  in  the  top-level  installation  directory  data. 

The  recognition  system  is  run  in  batch  mode  with  image  file  inputs  and  ASCII  text  file  outputs,  and  the  system 
contains  no  Graphical  User  Interface.  The  command  line  usage  of  hsfsys  is  as  follows: 

# hsfsys 

Usage: 

hsfsys  [options]  <hsf  file>  <output  root> 


-d 

process  digit  fields 

-1 

process  lower  case  fields 

-u 

process  upper  case  fields 

-c  nodict 

process  Constitution  field  without  dictionary 

-edict 

process  Constitution  field  using  dictionary 

-m 

small  memory  mode 

-s 

silent  mode 

-V 

verbose  mode 

-t 

compute  and  report  timings 

The  command  line  arguments  for  h:rfsys  are  organized  into  option  specifications,  followed  by  an  input  file 
name  specification,  and  an  output  file  name  specification.  The  options  can  be  subgrouped  into  three  general  types  (field 
type  options,  memory  control  options,  and  message  control  options). 

Field  type  options: 

-d  designates  the  processing  of  the  digit  fields  on  an  HSF  form. 

-1  designates  the  processing  of  the  lower  case  field  on  an  HSF  form. 

-u  designates  the  processing  of  the  upper  case  field  on  an  HSF  form. 

-c  designates  the  processing  of  the  Constitution  field  on  an  HSF  form.  This  option  requires  an  argument. 

If  the  argument  nodict  is  specified,  then  no  dictionary-based  postprocessing  is  performed  and  the  raw 
character  classifications  and  associated  confidence  values  are  reported.  If  the  argument  diet  is  speci- 
fied, then  dictionary-based  postprocessing  is  performed  and  matched  words  from  the  dictionary  are 
reported  without  any  confidence  values. 

The  options  -dluc  can  be  used  in  any  combination.  For  example,  use  only  the  -1  option  to  process  the  lower 
case  field,  or  use  only  the  -d  option  to  process  aU  of  tire  digit  fields.  If  processing  both  lower  case  and  upper 
case  fields,  then  specify  both  options  -1  and  -u  (or  an  equivalent  syntax  -lu).  The  system  processes  aU  of  the 
fields  on  the  form  fr  no  field  type  options  are  specified,  and  dictionary-based  posq)rocessing  is  performed  on 
the  Constitution  field  by  default 

Memory  control  options: 

-m  specifies  the  use  of  alternative  prototype  files  for  classification  that  have  fewer  training  patterns,  so  that 

machines  with  limited  main  memory  may  be  able  to  completely  process  aU  the  fields  on  an  HSF  form. 
In  general,  decreasing  the  number  of  training  prototypes  reduces  the  accuracy  of  the  recognition  sys- 
tem’s classifier.  It  is  recommended  that  this  option  be  used  only  when  necessary. 
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Message  control  options: 

-s  specifies  that  the  silent  mode  is  to  be  used  and  all  messages  sent  to  standard  output  and  standard  error 
are  suppressed  except  upon  the  detection  of  a fatal  internal  error.  Silent  mode  facilitates  silent  batch 
processing  and  overrides  the  verbose  mode  option.  By  default,  the  system  posts  its  recognition  results 
to  standard  output  as  each  field  is  processed. 

-V  specifies  that  the  verbose  mode  is  to  be  used  so  that  messages  providing  a functional  trace  through  the 
system  are  printed  to  standard  error. 

-t  specifies  that  timing  data  is  to  be  collected  on  system  functions  and  reported  to  a timing  file  upon  sys- 
tem completion. 

File  name  specifications: 

<hsf  file>  specifies  the  binary  HSF  image  in  IHead  format  that  is  to  be  read  by  the  system. 

<output  root>  the  root  file  name  that  is  to  be  appended  to  the  front  of  each  output  file  generated  by  the 

system.  Upon  completion,  the  system  will  create  a hypothesis  file  with  the  extension  hyp 
and  a confidence  file  with  the  extension  con.  If  the  -t  option  is  specified,  a timing  file  with 
the  extension  tim  will  also  be  created. 

For  example,  to  run  the  system  in  verbose  mode  on  all  the  HSF  fields  on  the  form  in  data/f0000_14  and  store 
the  system  results  in  the  same  location  with  the  same  root  name  as  the  form,  the  following  commands  are  equivalent 
(assuming  the  installation  directory  is  lusrilocallhsfsys).  In  each  case,  the  files  created  by  the  system  wlQ  be  lusri local! 
hsfsysldatalfOOOO_l 4IJD000_14. hyp  and  fusr! local! hsfsy si data!jV000_l 4!fi)000_l 4. con. 

# hsfsys  -V  /usr/local/hsfsys/data/f(XXX)_14/fC)()00_14.pct  /usr/local/hsfsys/data/f00(X)_14/f0(X)0_14 

# hsfsys  -V  Aisr/local/hsfsys/data/f(XX)0_14/f(XXX)_14.{pct,} 

# (cd  /usr/local/hsf/data/f(XXX)_14;  hsfsys  -v  f(XXX)_14.pct  ,/f0000_14) 

To  run  the  system  in  silent  mode  on  only  the  digit  and  upper  case  fields  on  the  same  form  with  results  includ- 
ing timing  data  all  stored  in  !tmp  with  the  root  name/oo,  the  following  command  can  be  used.  In  this  example,  the  files 
created  by  the  system  will  be  !tmp!foo.hyp,  !tmp!foo.con,  and  !tmp!foo.tim. 

# hsfsys  -stdu  /usr/local/hsfsys/data/f(XXX)_14/f(XXX)_14.pct  /tmp/foo 
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9.  SOFTWARE  DOCUMENTATION 


This  section  documents  the  overall  functionality  of  the  hsfsys  program.  Each  subsection  describes  one  of  the 
many  steps  conducted  by  the  standard  reference  recognition  system.  Included  with  each  subsection  heading  is  the  file 
and  subroutine  within  the  software  distribution  responsible  for  carrying  out  the  steps  described  therein.  Figures  such 
as  Figure  6 have  been  organized  in  order  to  provide  a top-down  functional  road  map  through  the  source  code  which  in 
turn  is  cross-referenced  to  the  documentation  in  this  section. 

The  main  routine  is  located  in  the  distribution  file  srcibinihsfsysihsfsys.c.  Figure  5 depicts  the  main  routine 
divided  into  five  functional  groups.  The  first  group  “DO  HSF  FORM”  is  responsible  for  processing  an  HSF  form 
image,  dividing  the  image  into  separate  fields.  To  accomplish  this,  the  HSF  form  has  to  be  registered  so  that  any  dis- 
tortion due  to  reproduction  and  scanning  is  removed.  Once  the  image  is  registered,  the  pixel  information  comprising 
the  HSF  form  is  removed.  This  involves  erasing  the  black  pixels  in  the  image  that  comprise  the  form’s  boxes  and 
instructions.  The  other  four  groups  listed  in  Figure  5 represent  field-level  processing.  These  functions  are  respectively 
responsible  for  reading  the  values  handprinted  in  the  digit  fields,  lower  case  field,  upper  case  field,  and  the  Constitution 
box  on  an  HSF  form.  The  final  step  df  writing  the  system  results  to  output  files  is  not  included  in  the  figure. 

9.1  DO  HSF  FORM;  src/Iib/hsf/form.c;  do„hsf_formO 

Two  processing  steps  are  conducted  on  the  input  HSF  form  image.  The  HSF  form  is  first  registered  and  then 
the  form  itself  is  removed  from  the  image.  Upon  completion,  the  handprinted  characters  within  each  field  on  the  form 
are  ready  to  be  processed.  Figure  6 lists  the  steps  used  to  process  the  HSF  form.  The  figure  is  divided  into  two  parallel 
lists.  The  left  list  contains  functional  titles  assigned  to  each  step,  whereas  the  right  list  provides  source  code  references 
that  cite  the  file  and  subroutine  names  within  the  software  distribution.  Both  lists  contain  the  section  numbers  corre- 
sponding to  where  each  topic  is  discussed  in  this  document.  For  example,  the  topic  “Transform  Form  Image”  is  dis- 
cussed in  Section  9.1.2.1.4  and  is  performed  by  the  subroutine /j^t  j)aram3  Jmage2()  f(xm.d  in  the  file  srciliblimagel 
fitimage.c.  The  overall  processing  of  the  HSF  form  im^e  is  divided  into  an  initialization  step  and  a processing  step. 


9.1.1  INITIALIZE  FOR  HSF  FORM;  src/lib/hsf/form.c;  imt„formO 

Initializing  the  system  to  process  an  HSF  form  image  involves  reading  two  files,  a file  of  an  HSF  form  image 
that  has  been  filled  in  and  a file  containing  a spatial  field  template.  The  HSF  form  image  file  is  specified  on  the  com- 
mand line  when  h^sys  is  invoked.  The  field  template  file  is  defined  internal  to  the  source  code  and  is  provided  in  the 
file  tmplt/hsftmplt.pts.  The  field  template  defines  the  location  of  each  entry  field  on  the  form.  The  formats  of  these  two 
files  are  discussed  below. 

9. 1.1.1  READ  FORM  IMAGE;  srcAib/image/readrast.c;  ReadBinaryRasterO 

Hsfsys  expects  input  images  to  be  in  the  IHead  file  format  Image  file  formats  and  effective  data  compression 
are  critical  to  the  usefulness  of  these  types  of  image  recognition  systems.  HSF  form  image  files  must  be  digitized  in 
binary  at  12  pixels  per  millimeter  (300  dpi),  must  be  2560  pixels  wide  and  3300  pixels  high,  and  can  be  2-dimension- 
ally  compressed  using  CCll  1 Group  4.  These  are  the  same  file  format  conventions  used  with  the  distribution  of  SDl 
and  SD3. 

In  this  application,  a raster  knage  is  a digital  encoding  of  light  reflected  from  discrete  points  on  a scanned 
form.  The  2-dimensional  area  (rf  the  form  is  divided  into  discrete  locations  accordmg  to  the  resolution  of  a specified 
grid.  Each  cell  of  this  grid  is  represented  by  a single  bit  value  0 or  1 called  a pixel;  0 represents  a cell  predominately 
white,  1 represents  a cell  predominately  black.  Pixels  are  scanned  from  the  2-dimensional  sampling  grid,  and  they  are 
then  stored  as  a 1 -dimensional  vector  of  bit  values  in  raster  order;  left  to  right,  top  to  bottom  (row  major).  Upon  scan- 
ning, certain  attributes  such  as  image  width  and  height  are  required  to  accurately  interpret  the  1 -dimensional  vector  of 
pixels  as  a 2-dimensicnnal  image. 
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3.  HSFSYS 


f 3.1  DO  HSF  FORM 

3.1.1  INITIALIZE  FOR  HSF  FORM 

3.1.2  PROCESS  HSF  FORM J 


f 3.2  DO  DIGIT  FIELDS  \ 

3.2.1  INITIALIZE  FOR  FIELDS 

For  Each  Digit  Field 

3.2.2  PROCESS  DIGIT  FIELD 

3.2.3  STORE  FIELD  RESULTS 

End  Loop 

3.2.4  DEALLOCATE  FOR  FIELDS  J 


f 33  DO  LOWER  CASE  FIELD  \ 

3.2.1  INITIALIZE  FOR  FIELDS 

3.3.1  PROCESS  ALPHABETIC  FIELD 

3.2.3  STORE  FIELD  RESULTS 

^ 3.2.4  DEALLOCATE  FOR  FIELDS  j 

f 3.4  DO  UPPER  CASE  FIELD  ^ 

3.2.1  INITIALIZE  FOR  FIELDS 

3.3.1  PROCESS  ALPHABETIC  FIELD 

3.2.3  STORE  FIELD  RESULTS 

^ 3.2.4  DEALLOCATE  FOR  FIELDS  J 


f 3.5  DO  CONSTITUTION  FIELD  \ 

3.2.1  INITIALIZE  FOR  FIELDS 

3.5.1  PROCESS  CONSTITUTION  FIELD 

3.2.3  STORE  FIELD  RESULTS 

^ 3.2.4  DEALLOCATE  FOR  FIELDS  J 

Figure  5.  Functionality  of  the  system’s  main  routine  srcibinihsfsysihsfsys.c. 
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Figure  6.  Steps  to  process  the  HSF  form. 
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3.1 .2.2  REMOVE  FORM  3. 1 .2.2  src/lib/hsf/rmform.c;  remove_form() 

3. 1 .2.2.1  Read  Form  Mask  Image  3. 1 .2.2. 1 src/lib/image/readrast.c;  ReadBinaryRaster() 

3. 1.2. 2.2  Subtract  Form  Pixels  3. 1.2. 2.2  src/lib/image/binlogop.c;  nandbinimage() 


NIST  has  designed  a header  structure  called  IHead  to  hold  these  attributes  and  has  developed  a file  inter- 
change format  based  on  this  header.  Numerous  image  formats  exist;  some  are  widely  supported  on  small  personal  com- 
puters, others  supported  on  larger  workstations;  most  are  proprietary  formats;  few  are  public  domain.  IHead  is  an 
attempt  to  design  an  open  image  format  which  can  be  universally  implemented  across  heterogeneous  computer  archi- 
tectures and  environments.  IHead  has  been  successfully  ported  and  tested  on  several  systems  including:  UNIX  work- 
stations and  servers,  DOS  personal  computers,  and  VMS  mainframes.  IHead  has  been  designed  with  an  extensive  set 
of  attributes  in  order  to:  adequately  represent  both  binary  and  gray  level  images:  represent  images  captured  firom  dif- 
ferent scanners  and  cameras;  and  satisfy  the  image  requirements  of  diverse  applications,  including  but  not  limited  to, 
image  archival/retrieval,  character  recognition,  and  fingerprint  classification. 


File  Name:  IHeadJi 

Package:  NIST  Internal  Image  Header 

Author:  Michael  D.  Garris 

Date:  2/08/90 


/*  Defines  used  by  the  ihead  structure  */ 

#define  IHDR_S]ZE  288  /*  len  of  hdr  record  (always  even  bytes)  */ 

#define  SHORT_CHARS  8 /*#  of  ASCII  chars  to  represent  a short  */ 

#define  BUFSIZE  80  /*  default  buffer  size  */ 

#define  DATELEN  26  /*  character  length  of  date  string  */ 


typedef  struct  ihead  { 
char  idfBUFSIZE]; 
char  cieated[D  Al  ELEN]  ; 
char  width[SHORT_CHARS]; 
char  height[SHORT_CHARS]; 
char  depth[SHORT_CTIARS]; 
char  density [SHORT_CHARS]; 
char  compiess[SHORT_CHARS]; 
char  complen[SHORT_CHARS]; 
char  align[SHORT_CHARS]; 
char  unitsize[SHORT_CHARS]; 
char  sigbit; 
char  byte_order; 

char  pix_offset[SHORT_CHARS]; 

char  whitepix[SHORT_CHARS]; 

char  issigned; 

char  rm_cm; 

char  tb_bt; 

char  lr_rl; 

char  parentfBUFSIZE]; 
char  par_x[SHORT_CHARS]; 
char  par_y[SHORT_CHARS]; 
IIHEAD; 


/*  identification/comment  field  */ 

/*  date  created  */ 

/*  pixel  width  of  image  */ 

/*  pixel  height  of  image  */ 

/*  bits  per  pixel  */ 

/*  pixels  per  inch  */ 

/*  compression  code  */ 

/*  compressed  data  length  */ 

/*  scanline  multiple:  8116132  */ 

/*  bit  size  of  image  memory  units  */ 

/*  0->sigbit  first  I l->sigbit  last  */ 

/*  0->highlow  I l->lowhigh*/ 

/*  pixel  column  offset  */ 

/*  intensity  of  white  pixel  */ 

/*  0->unsigned  data  I l->signed  data  */ 
/*  0->row  maj  I l->column  maj  */ 

/*  0->top2bottom  I l->bottom2top  */ 

/*  0->left2right  I l->right21eft  */ 

/*  parent  image  file  */ 

/*  from  X pixel  in  parent  */ 

/*  from  y pixel  in  parent  */ 


Figure  7.  C structure  definition  for  the  IHead  header. 


The  IHead  structure  definition  written  in  C and  stored  in  includelihead.h  is  listed  in  Figure  7,  while  Figure  8 
lists  the  header  values  from  an  IHead  file  corresponding  to  these  structure  members.  This  header  information  belongs 
to  the  isolated  box  image  displayed  in  Figure  9 (scaled  up  2X).  Referencing  the  structure  members  listed  in  Figure  7, 
the  first  attribute  field  of  EHead  is  the  identification  field,  id.  This  field  uniquely  identifies  the  image  file,  typically  by 
a file  name.  The  attribute  field,  created,  is  the  date  on  which  the  image  was  captured  or  digitized.  The  next  three  fields 
hold  the  image’s  pixel  width,  height,  and  depth.  A binary  image  has  a pixel  depth  of  1 whereas  a gray  scale  image 
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containing  256  possible  shades  of  gray  has  a pixel  depth  of  8.  The  attribute  field,  density,  contains  the  scan  resolution 
of  the  image;  in  this  case,  12  pixels  per  millimeter  (300  dpi).  The  next  two  fields  deal  with  compression. 

In  the  recognition  system  distribution,  input  IHead  images  can  be  uncompressed  or  compressed  using  CCl'l  1 
Group  4.  Whether  the  image  is  compressed  or  not,  the  IHead  header  is  always  tmcompressed.  This  enables  header 
interpretation  and  manipulation  without  the  overhead  of  decompression.  The  compress  field  is  an  integer  flag  which 
signifies  which  compression  technique,  if  any,  has  been  apphed  to  the  raster  image  data  which  follows  the  header.  If 
the  compression  code  is  zero,  then  the  image  data  is  not  compressed,  and  the  data  dimensions:  width,  height,  and  depth, 
are  sufficient  to  load  the  image  into  main  memory.  However,  if  the  compression  code  is  nonzero,  then  the  complen 
field  must  be  used  in  addition  to  the  image’s  pixel  dimensions.  For  example,  the  image  described  in  Figure  8 has  a 
compression  code  of  2.  By  convention,  this  signifies  that  CCil  1 Group  4 compression  has  been  apphed  to  the  image 
data  prior  to  file  creation.  In  order  to  load  the  compressed  image  data  into  main  memory,  the  value  in  complen  is  used 
to  load  the  compressed  block  of  data  into  main  memory.  Once  the  compressed  image  data  has  been  loaded  into  mem- 
ory, CClil  Group  4 decompression  can  be  used  to  produce  an  image  which  has  the  pixel  dimensions  consistent  with 
those  stored  in  its  header.  A compression  ratio  of  20  to  1 is  typicaUy  achieved  using  CCli  l Group  4 compression  on 
the  HSF  form  images  provided  in  this  distribution. 


IMAGE  FILE  HEADER 


Identity 

box_03.pct 

Header  Size 

288  (bytes) 

Date  Qeated 

Thu  Jan  4 17:34:21  1990 

\^idth 

656  (pixels) 

Height 

135  (pixels) 

Bits  per  Pixel 

1 

Resolution 

300  (ppi) 

Compression 

2 (code) 

Compress  Length 

874  (bytes) 

Scan  Alignment 

16  (bits) 

Image  Data  Unit 

16  (bits) 

Byte  Order 

High-Low 

MSBit 

First 

Column  Offset 

0 (pixels) 

White  Pixel 

0 

Data  Units 

Unsigned 

Scan  Order 

Row  Major, 

Top  to  Bottom, 

Left  to  Right 

Parent 

data/f0000_14/f0000_14.pct 

X Ori^ 

192  (pixels) 

Y Origin 

732  (pixels) 

Figure  8.  Contents  of  an  IHead  header  hsted  in  a formatted  report. 


^ 1 ^ 

0 / yyi  7 ^ f 

Figure  9.  Image  belonging  to  the  header  listed  in  Figure  8. 
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The  attribute  field,  align,  stores  the  alignment  boundary  to  which  scan  lines  of  pixels  are  padded.  Pixel  values 
of  binary  images  are  stored  8 pixels  (or  bits)  to  a byte.  In  general,  images  are  not  an  even  multiple  of  8 pixels  in  width. 
In  order  to  minimize  the  overhead  of  ending  a previous  scan  line  and  beginning  the  next  scan  line  within  the  same  byte, 
a number  of  padded  pixels  are  provided  in  order  to  extend  the  previous  scan  line  to  an  even  byte  boundary.  Some  dig- 
itizers extend  this  padding  of  pixels  out  to  an  even  multiple  of  8 pixels,  other  digitizers  extend  this  padding  of  pixels 
out  to  an  even  multiple  of  16  pixels.  This  field  stores  the  image’s  pixel  alignment  value  used  in  padding  out  the  ends 
of  raster  scan  lines. 

The  next  three  attribute  fields  identify  binary  interchangmg  issues  among  heterogeneous  computer  architec- 
tures and  displays.  The  unitsize  field  specifies  how  many  contiguous  pixel  values  are  bundled  into  a single  unit  by  the 
digitizer.  The  sigbit  field  specifies  the  order  in  which  bits  of  significance  are  stored  within  each  unit;  most  significant 
bit  first  or  least  significant  bit  first.  The  last  of  these  three  fields  is  the  byte_order  field.  If  unitsize  is  a multiple  of 
bytes,  then  this  field  specifies  the  order  in  which  bytes  occur  within  the  unit  Given  these  three  attributes,  binary  incom- 
patibilities across  computer  hardware  and  binary  format  assumptions  within  application  software  can  be  identified  and 
effectively  dealt  with. 

The  pix_offset  attribute  defines  a pixel  displacement  from  the  left  edge  of  the  raster  image  data  to  where  a 
particular  image’s  significant  image  information  begins.  The  whitepix  attribute  defines  the  value  assigned  to  the  color 
white.  For  example,  the  binary  image  described  in  Figure  8 is  black  text  on  a white  background  and  the  value  of  the 
white  pixels  is  0.  This  field  is  particularly  useful  to  image  display  routines.  The  issigned  field  is  required  to  specify 
whether  the  units  of  an  image  are  signed  or  unsigned.  This  attribute  determines  whether  an  image  with  a pixel  depth 
of  8,  should  have  pixels  values  interpreted  in  the  range  of  -128  to  +127,  or  0 to  255.  The  orientation  of  the  raster  scan 
may  also  vary  among  different  digitizers.  The  attribute  field,  rm_cm,  specifies  whether  the  digitizer  captured  the  image 
in  row -major  order  or  column-major  order.  Whether  the  scan  lines  of  an  image  were  accumulated  from  top  to  bottom, 
or  bottom  to  top,  is  specified  by  the  field,  tb_bt,  and  whether  left  to  right,  or  right  to  left,  is  specified  by  the  field,  rl_lr. 

The  final  attributes  in  IHead  provide  a single  historical  link  from  the  current  image  to  its  parent  image;  the 
one  from  which  the  current  image  was  derived  or  extracted.  In  Figure  8,  the  parent  field  contains  the  full  path  name 
to  the  image  from  which  the  image  displayed  in  Figure  9 was  extracted.  The  par_x  and  par_y  fields  contain  the  origin, 
upper  left  hand  comer  pixel  coordinate,  from  where  the  extraction  took  place  from  the  parent  image.  These  fields  pro- 
vide a historical  thread  through  successive  generations  of  images  and  subimages. 

We  believe  that  the  IHead  image  format  contains  the  minimal  amount  of  ancillary  mformation  required  to  suc- 
cessfully manage  binary  and  gray  scale  images.  The  IHead  format  is  extremely  diverse  in  its  abihty  to  represent  a wide 
variety  of  images.  However,  hsfsys  requires  a predetermined  set  of  attributes  to  be  used  in  the  IHead  structure.  All  HSF 
form  images  must  be  2560  by  3300  pixels  in  dimension.  The  images  must  be  binary,  one  bit  per  pixel  with  0 represent- 
ing white  and  1 representing  black.  The  images  can  be  either  uncompressed  or  compressed  using  CCITT  Group  4.  The 
binary  raster  data  is  assumed  to  be  in  a high-low  byte  order  with  the  most  significant  bit  first  in  a byte  of  pixel  data. 
The  pix_offset  attribute  is  not  used,  so  all  pixel  data  in  the  image  is  processed  by  hsfsys.  Finally,  the  data  units  are 
assumed  to  be  unsigned  and  the  scan  order  is  left-to-right  and  top-to-bottom. 

The  file  format  is  illustrated  in  Figure  10.  Each  IHead  image  file  is  divided  into  an  IHead  header  followed  by 
the  image’s  raster  data.  Preceding  the  header  is  an  8-byte  record  containing  the  length  of  the  IHead  header.  Both  the 
value  of  the  length  record  and  the  header  values  themselves  are  represented  in  ASCII.  The  raster  data  following  the 
header  is  m the  binary  format  described  by  the  attribute  values  in  the  header  and  may  be  compressed.  In  this  way,  the 
header  portion  of  the  IHead  image  always  remains  uncompressed  and  can  be  interpreted  by  heterogeneous  computer 
architectures.  Apphcations  that  intend  to  manipulate  the  raster  data  of  an  IHead  image  are  able  to  first  read  the  ASCII 
header  containing  the  image’s  attributes  and  determine  the  proper  interpretation  of  the  data  that  follows  it 
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Header  Length  (8  bvtes) 

ASCn  Format  Image  Header 

(288  bytes) 

Binary  Raster  Stream 
000000010000010000011111110 . . . 

• Representing  the  digital  scan  across  the 

page  left  to  right,  top  to  bottom. 

• ‘0’  - Represents  a white  pixel. 

• ‘ 1 ’ - Represents  a black  pixel. 

• 8 Pixels  are  packed  into  a single  byte 

of  memory. 


Figure  10.  Illustrated  IHead  file  format  for  an  uncompressed  image. 

9. 1.1. 2 READ  HKi  D TENffLATE;  src/lib/hsf/hsftmplt.c;  read_hsftmplt0 

The  spatial  field  template  defines  the  location  of  each  entiy  field  on  a registered  HSF  form.  A registered  form 
is  a fonn  that  has  no  distortion  and  the  location  of  the  pixels  comprising  the  form  is  known.  The  field  template  file  is 
an  ASCn  file  in  which  the  first  line  contains  the  number  of  fields  represented  in  the  file.  Each  subsequent  fine  in  the 
template  file  represents  an  independent  field  on  the  form.  In  all,  there  are  34  entry  fields  on  an  HSF  form.  All  but  the 
first  3 fields  are  processed  by  hrfsys.  All  ASCH  files  used  by  hsfsys  were  generated  and  designed  to  run  on  UNIX  com- 
puters so  the  end  of  each  fine  is  represented  by  the  single  fine  feed  character  with  decimal  representation  10  and  hexa- 
decimal representation  OA  (OxOA  in  Q.  This  is  different  from  ASCII  files  on  DOS  computers  that  represent  the  end  of 
each  line  with  two  characters,  a fine  feed  character  (M)A  followed  by  a carriage  return  character  OxOD.  Each  field  is 
represented  as  a rectangular  region  in  the  template  file  by  a fine  comprised  of  8 numbers.  These  8 numbers  represent 
four  (x,  y)  vertex  pairs.  The  first  pair  represents  the  upper-left  comer,  the  second  pair  represents  the  upper-right  comer, 
the  third  pair  represents  the  lower-left  comer,  and  the  fourth  pair  represents  the  lower-right  comer.  Each  number  on  a 
fine  is  separated  by  a space  character  0x20  or  a tab  character  0x09. 

The  field  template  provided  with  this  distribution  is  in  tmpWhsftmplt.pts.  The  field  regions  stored  within  this 
file  were  measured  from  the  blank  registered  form  image  tmpltlhsftmplt.pct. 


9.1^  PROCESS  HSF  FORM;  src/lib/hsf/form.c;  process_form20 

To  process  an  HSF  form,  the  image  is  first  registered  to  remove  any  distortion  introduced  from  reproduction 
or  scanning,  then  the  pixels  comprising  the  form’s  boxes  and  instmctions  are  removed,  leaving  only  the  handprinted 
data  entered  inside  of  each  field  in  the  image. 

9. 1.2.1  REGISTER  FORM  IMAGE;  src/fib/hsf/reghsf.c;  register_hsf20 

The  HSF  forms  distributed  with  SDl  and  SD3  were  type-set  on  a computer  and  originally  produced  on  paper 
using  a laser  printer.  Multiple  copies  of  each  form  were  then  reproduced  on  a large  photocopier.  The  copies  were 
bifolded  into  legal-size  envelopes,  mailed  out,  filled  in  by  Census  representatives,  mailed  back  to  NIST  in  business 
return  envelopes,  and  finally  digitized  through  an  automated  document  feeder  on  a scaimer.  This  process  produces  sev- 
eral sources  of  distortion  in  the  final  image.  The  distortion  includes  rotation,  translation,  scale,  and  fold  distortions  that 
must  be  accounted  for  in  order  fw  the  recognition  system  to  reliably  locate  the  data  entered  in  each  field  on  a form. 
These  types  of  distortions  are  detected  and  removed  through  a process  known  as  form  registration. 
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Hsfsys  uses  a registration  technique  based  on  Linear  Least  Squares^"^  where  a set  of  predefined  registration 
marks  in  an  input  HSF  form  image  are  matched  to  marks  on  an  ideal  undistorted  template.  Global  estimates  of  rotation, 
translation,  and  scale  are  automatically  computed  and  applied  so  that  the  input  image  is  transformed  to  line  up  as  well 
as  possible  in  a least  squares  sense  with  the  ideal  template. 

9. 1.2. 1.1  Read  Reference  Points;  src/lib/mdgyteadmfs.c;p  readmfsint2() 

A set  of  registration  marks  is  needed  in  order  to  estimate  the  amount  of  distortion  in  an  input  image.  These 
registration  marks  correspond  to  structures  easily  detectable  within  an  image  of  a form.  These  may  be  actual  fiducial 
marks  or  they  may  be  structures  embedded  within  the  form  itself.  Six  points  were  measured  from  the  blank  registered 
HSF  form  tmplt/hsftmplt.pct.  These  points  are  stored  in  the  file  tmpltlhsfregpts  and  correspond  in  order  to  the  top-left 
comer  of  the  leftmost  0 through  9 digit  box,  the  top-left  point  on  the  H in  the  form’s  title  “HANDWRITING  SAMPLE 
FORM”,  the  top-right  comer  of  the  QTY-STATE-ZIP  box,  the  top-left  comer  of  the  Constitution  box,  the  bottom-left 
comer  of  the  Constitution  box,  and  the  bottom-right  comer  of  the  Constitution  box.  These  registration  points  are  anno- 
tated on  the  HSF  form  shown  in  Figure  11  (scaled  0.5X). 


•WRITING  SAMPLE  FORM 


DATE 


CITY 


STATE  ZIP 


mz  ‘49‘^o  I 
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Figure  11.  Registration  marks  on  an  HSF  form. 
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The  six  registration  marks  on  the  HSF  form  were  selected  so  that  they  are  distributed  across  the  entire  form 
with  a shght  concentraticm  of  points  in  the  top-left  portion  of  the  form.  This  concentration  is  due  to  hsfsys  exhibiting 
sensitivities  to  the  top-left  of  the  form  when  conducting  form  removal.  Some  of  this  sensitivity  is  known  to  be  caused 
by  local  distortions  in  the  image  where  the  form  was  folded  when  it  was  sent  through  the  mad. 

The  file  tmpWhsfreg.pts  is  in  a general  NIST  file  format  known  as  a Multiple  Feature  Set  (MFS)  file.  MFS 
files  are  editable  ASCH  files  designed  to  contain  lists  of  single  or  multi-column  data  where  the  data  values  residing  on 
the  same  line  are  strongly  associated  with  each  other.  The  first  line  in  the  file  contains  the  number  of  subsequent  lines 
in  the  file.  In  the  case  of  the  reference  points  file,  there  are  6 subsequent  lines  in  tmpWhsfreg.pts,  each  containing  an 
(x,  y)  coordinate  pair  of  numbers.  A space  or  tab  character  is  used  to  separate  values  within  the  same  line,  and  lines 
are  terminated  with  the  line  feed  character  OxOA.  The  library  srcilibimfs  contains  a suite  of  routines  designed  to  read 
and  write  MFS  files  and  manipulate  MFS  structures. 

Figure  12  lists  the  C definition  of  the  MFS  structure  that  is  stored  in  include/mfs.h.  The  structure  contains 
three  members.  Values  references  an  array  of  character  strings,  alloc  holds  tiie  number  of  allocated  positions  within 
values,  and  num  holds  the  number  of  contiguous  positions  currently  holding  information  in  values.  Each  line  after  the 
first  in  an  Nff  S file  is  read  into  a single  string,  which  in  turn  is  stored  in  the  next  available  position  within  values.  Mul- 
tiple items  on  a single  MFS  file  fine  are  appended  together  in  a single  string.  It  is  the  responsibility  of  an  application 
to  parse  the  independent  items  from  the  strings  stored  in  the  values  array.  In  the  case  of  tmpWhsfreg.pts,  the  file  is  read 
into  an  MFS  structure  and  then  the  x and  y coordinates  are  parsed  into  two  separate  integer  arrays.  The  MFS  file  con- 
vention provides  a common  I/O  interface  when  manipulating  editable  ’ • '‘f  AS  (HI  values.  The  items  listed  in  an  MFS 

file  can  be  integers,  floating  point  numbers,  names,  and/or  any  sequc*.^  of  printable  ASCII  characters. 


typedef  struct  mfsstruct{ 
int  alloc; 
hit  num; 
char  **values; 

}MFS; 

Figure  12.  C definition  for  the  MFS  structure. 

9. 1.2. 1.2  Locate  Hypothesized  Points;  src/lib/hsf/hsfpoint.c;  hsfpointsQ 

The  amount  of  discrepancy  between  the  registration  marks  on  an  ideal  undiston  ..d  fOTm  and  the  position  of 
the  corresponding  marks  in  an  input  form  image  are  used  to  estimate  the  amount  of  distortion  in  the  input  image.  Hsfsys 
uses  spatial  histogram  projections  to  locate  the  position  of  these  registration  marks  within  the  input  HSF  form  image. 
The  spatial  histograms  represent  black  pixel  dei  ' ities  aggregated  across  an  image  region  either  in  a horizontal  or  ver- 
tical orientation. 

Figure  13  contains  an  image  region  containing  the  second  registration  mark  in  tmpWhsfreg.pts,  the  top-left 
point  on  the  H in  the  form’s  title  “HANDWRITING  SAMPLE  FORM”.  The  top  image  in  the  figure  shows  the 
sequence  of  subimages  on  which  spatial  histograms  are  computed  in  order  to  locate  the  registration  mark.  Eaci  ib- 
image  has  been  assigned  a number  that  cor  txinds  to  one  of  the  spatial  histograms  displayed  below  the  top  image. 

Horizontal  histogram  1 is  first  computed  on  the  entire  image  region.  There  are  two  bands  of  black  in  the  his- 
togram. The  top  band  represents  the  characters  in  the  form’s  title.  Ver  ■ ^al  histogram  2 is  computed  on  a subimage  that 
is  centered  about  the  top  of  the  title  determi  led  firom  histogram  1.  Hi.  .;ram  2 is  used  to  locate  the  left  end  of  the  title. 
Horizontal  histogram  3 is  computed  on  a sl  . j-;mage  that  begins  at  the  ift  edge  of  the  title  determined  by  histogram  2 
and  extends  approximately  the  width  of  a imgle  character.  Histogram  3 is  used  to  determine  where  the  top  the  H,  the 
first  character  in  the  title,  begins.  Vertical  histogram  4 is  computed  on  a subimage  that  begins  at  the  top  of  the  letter  H 
and  extends  downward  approximately  the  height  of  a single  character.  This  final  histogram  is  used  to  determine  the 
left  edge  of  the  letter  H,  at  which  point  the  registration  mark  is  located. 
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Figure  13.  Locating  a registration  mark  using  spatial  histograms. 
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As  an  image  is  increasingly  rotated,  the  peaks  in  the  histograms  become  shorter  and  they  spread  out  wider 
making  them  decreasingly  reliable  and  increasingly  inaccurate.  Therefore,  the  technique  deployed  in  hrfsys  carefully 
reduces  the  scope  of  successive  histogram  projections,  alternating  between  horizontal  and  vertical  projections,  until 
the  desired  structure  is  accurately  isolated.  Hsfsys  has  been  engineered  and  tested  to  tolerate  up  to  5 degrees  of  rotation 
in  combination  with  1.27  cm  (0.5  inches)  of  translation. 

9.1.2. 1.3  Compute  Distortion  Parameters;  src/lib/stats/lsq3.c;  chknfindparam30 

Once  the  registration  marks  are  located  on  a form,  parameters  estimating  the  amount  of  rotation,  translation, 
and  scale  can  be  computed.  The  estimation  of  distortion  parameters  is  embedded  in  a technique  for  detecting  form  reg- 
istration failures.  This  section  first  describes  how  distortion  parameters  are  used  to  detect  form  registration  failures  and 
then  presents  a method  for  deriving  these  distortion  parameters  using  Linear  Least  Squares  (LSQ). 

Figure  14  contains  pseudocode  for  an  algorithm  that  detects  form  registration  failures.  The  technique  deter- 
mines when  registration  points  from  within  an  input  form  hn^e  are  incorrectly  located.  The  recognition  system  can 
confuse  or  miss  registration  points  for  a number  of  different  reasons.  For  example,  a form  may  be  so  distorted  that  it 
cannot  be  corrected  by  die  registration  process.  More  frequently,  an  input  form  image  has  noise  such  as  extraneous 
marks  ot  writing  in  the  vicinity  of  a registration  mark,  or  worse  yet,  this  noise  may  occlude  the  registration  mark  alto- 
gether. In  the  case  where  only  one  or  two  registration  points  are  missed,  if  they  can  be  detected,  they  can  be  removed 
from  the  LSQ  computation.  In  most  cases,  using  the  remaining  located  registration  points  is  sufficient  for  successful 
form  registration. 


input:  located  registration  points  - hyp_pts, 
ideal  registration  points  - ref_pts 
while  (#  hyp_pts  > rmjimit) 

params  = compute  distortion  parameters  (hyp_pts,  ref_pts) 
for  each  pt  in  hyp_pts 

trans„pt  = apply  distortion  transformation  (pt,  params) 
errorsfi]  = distance  (trans_pt,  ref_pts[i]) 
end  for 

max_pt  = find  maximum  error  point  (errors) 
max_err  = find  maximum  error  (errors) 
if  (max_err  > err_limit)  then 
remove  max_pt  from  hyp_pts 
else 

break  from  while 
endif 
end  while 

if  (#  hyp_pts  < rm„lhnit)  then 
oufr)ut:  “form  registration  failed” 
else 

output:  “form  registration  successful”,  params 
endif 


Figure  14.  Pseudocode  for  chknfindparam3(),  which  detects  form  registration  failures. 

Walking  through  the  algorithm  in  Figure  14,  the  procedure  accepts  as  input  the  set  of  located  registration 
points  (hypothesis  points)  from  an  input  form  image.  The  procedure  accepts  a second  set  of  corresponding  points  (ref- 
erence points)  extracted  from  the  position  of  the  registration  marks  on  an  ideal  imdistorted  form.  While  the  number  of 
hypothesis  points  remaining  in  the  analysis  is  more  than  a given  threshold,  in  this  case  3 points,  the  analysis  continues. 
For  each  pass  through  the  while  loop,  distorticBi  parameters  are  computed  from  the  remaining  hypothesis  points  and 
their  correspcmding  reference  points.  Then,  using  the  new  distortion  parameters,  each  hypothesis  point  is  transformed. 
If  the  parameters  are  a good  estimate  of  the  actual  distortion,  the  transformed  points  will  be  very  close  to  their  corre- 
sponding reference  points.  An  error  distance  is  computed  between  each  hypothesis  and  reference  point  pair,  ff  the  max- 
imum error  distance  from  aU  the  points  exceeds  a given  threshold,  the  hypothesis  point  contributing  to  the  maximum 
error  is  removed  from  the  analysis,  and  new  distortion  parameters  are  computed  from  the  remaining  hypothesis  points. 
This  process  continues  until  either  there  are  too  few  hypothesis  points  remaining  to  accurately  compute  distortion 
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parameters,  or  the  maximnm  error  distances  from  all  the  remaining  points  falls  below  a specified  threshold.  If  too  few 
points  remain,  the  form  registration  is  determined  to  have  failed.  Otherwise,  form  registration  is  determined  to  be  suc- 
cessful, and  the  last  set  of  distortion  parameters  computed  are  used  to  transform  the  entire  input  form  image.  H^sys 
uses  an  error  threshold  of  4,  which  was  derived  empirically  from  a set  of  independent  studies. 

The  procedure  chknfindparam3( ) is  an  encapsulation  of  a lower  level  procedure  findparam3( ) also  located  in 
srcilibi stats! Isq3.c.  This  lower  level  procedure  is  responsible  for  computing  distortion  parameters  given  the  recogni- 
tion system’s  located  hypothesis  points  and  their  corresponding  reference  points.  These  distortion  parameters  are  esti- 
mated using  a method  of  LSQ  and  account  for  rotation,  translation,  and  scale.  A pair  of  linear  equations  using  3 
unknowns  can  be  defined  to  account  for  these  distortions. 

Xh  = ^x  + m^x^  + m^y^ 

= Ay  + m^y^  + m^x^ 

Equation  (1)  is  used  to  estimate  the  translation,  rotation,  and  scale  in  x using  the  three  unknown  quantities 
Ax,  , and  . Equation  (2)  is  used  to  estimate  the  translation,  rotation,  and  scale  in  y using  the  three  unknown 
quantities  Ay  ,m^  , and  . In  the  first  equation,  the  hypothesized  x-coordinate,  x^,  is  linearly  dependent  on  the  ref- 
erence x-coordina^,  ana  the  reference  y-coordmate  y^  The  same  is  true  for  the  hypothesized  y-coordinates  in  the 
second  equation.  Here,  reference  points  refer  to  the  registration  marks  stored  in  the  file  tmpltihsfregpts  corresponding 
to  the  blank  registered  form  tmpWhsftmplt.pct.  The  reference  points  are  where  the  marks  should  be  located  if  the  input 
image  has  absolutely  no  distortion  whatsoever.  Hypothesized  points  refer  to  the  registration  marks  located  within  the 
input  HSF  form  image  using  spatial  histograms. 

Applymg  the  method  of  LSQ  on  Equation  (1),  the  equation  expands  into  the  following  system  of  three  linear 
equations. 


(1) 

(2) 


Y.Xh  = + + 

/ = 1 / = 1 /•  = 1 


^ x^Xj.  = Ax'^Xj.-^  ^ ^ 


/ = 1 


/ = 1 / = 1 / = 1 


£ X0^  = Ax  £ >-,  + £ x^y^  + £ y] 

i = 1 i = 1 / = 1 / = 1 


(3) 


(4) 


(5) 


This  system  of  three  simultaneous  linear  equations  is  represented  in  matrix  form  as: 


where: 


B = AP 
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Solving  for  P,  the  following  equation  is  derived: 


P = A~^B 

The  inverse  of  the  matrix  A is  defined  to  be: 


A ^ = , ~ Ad j A 
detX 


The  determinant  of  A is  defined  to  be: 


(7) 

(8) 


- ^11^22^33'*' ^12^23^31'*' ^13^21^32  ^31^22^13  ^32^23^11  ” 12 

Using  cofactors,  the  adjunct  of  A is  defined  to  be: 


AdjA  = 


(<^22^33  ~ ^23^32)  ^^13^32  ~ ^12^33^  ^^12^23 
(^23*^31  “ '^21^^33)  (^ll‘^33  ” ^13^31)  (^13^^21 
('^2l'^32”'^22%l)  (‘^12^31  “ ('^11^^22 


^13^22) 

'^11^23) 

^12^21) 


Multiplying  A'^  by  B,  using  Equation  (8)  to  compute  A' ^ yields: 


P = A"^B 


Pii 

P21 

P31_ 


t>n  (^22^^33  ~ ^23^32)  ■*■  ^21  ('^13‘^32  “ ‘^12‘^33)  ^31  (^12^^23  ~ ^13^22^  ^ 

/^11^22%3'^  '^12^23^31  ■*"  ^13'^2l‘^32  “ *^31  ^22^^13  “ ‘^32'^23'^11  “ ^33^21^12) 
^11  (^23^^31  “ ^^21^33)  ■*■  ^21  (^11^33  “ ^13*^31^  ■*■  ^31  ('^13^21  “ ^ll'^23)  ^ 

/^11^22^33  ^12^23^31  ^13^21^32  ” %1  ^22^^13  ” ‘^32^23*^11  “ ^33^21^12) 

^11  (^21^32  ~ ^22^31)  ^21  (^12^31  “ '^11^32)  ^31  ('^ll‘^22  “ ^12^21^  ^ 

^^11^22^33  ■*■  ^12^23*^31  ‘^13‘^21^32  “ ^31^22'^13  ~ ‘^32^23^11  “ ^33^21^12/ 


The  LSQ  parameter  estimates  for  Equation  (1)  are  derived  by  substituting  the  elements  of  A and  B into  the 
equations  for  P.  TTie  parameter  estimates  for  Equation  (2)  are  derived  by  substituting  the  following  matrix  elements. 
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This  LSQ  method  computes  a linear  mapping  that  minimizes  the  total  discrepancy  (error)  between  aU  the  ref- 
erence points  on  a registered  form  and  their  corresponding  hypothesized  points  measured  on  an  input  form  containing 
distortion.  Assuming  that  all  points  are  reliably  detectable,  the  error  at  any  one  point  is  decreased  as  the  number  of 
points  used  in  the  Least  Squares  calculation  increases,  causing  the  registration  quality  to  improve. 

9. 1.2. 1.4  Transform  Form  Image;  src/lib/image/fitimage.c;  f_fit_param3_image20 

Using  the  method  of  Linear  Least  Squares,  the  parameter  estimates  Ax,  ,m^  , A)-  , niy  , and  are  sub- 
stituted back  into  Equations  (1)  and  (2)  and  the  pixels  in  the  input  HSF  form  are  trsmsformed  by  computing  Xxf^,  y^j). 
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F;^rh  black  pixel  in  input  image  is  mapped  or  pushed  to  a new  position  within  an  initially  blank  output  image.  This 

approach  is  efficient  because  it  only  computes  a transfoimation  for  those  pixels  in  the  image  that  are  black.  A trade- 
off to  this  approach  is  that  the  resulting  output  image  may  contain  small  amounts  of  speckle  noise  within  dense  black 
pixel  regions.  The  small  white  pixel  voids  are  caused  by  round-off  when  converting  real-valued  transformation 
addresses  to  discrete  pixel  locations.  An  alternative  to  pushing  is  pulling.  In  this  case,  an  inverse  transformation  is  com- 
puted for  every  pixel  positicm  in  the  output  image,  and  pixels  are  pulled  from  the  input  image  to  the  output  image.  This 
approach  ensures  complete  coverage  across  the  output  image,  and  the  speckle  noise  is  avoided.  Unfortunately,  a trans- 
formation is  computed  for  every  pixel  in  the  image  making  this  approach  computationally  more  expensive.  In  tight  of 
this,  hsfsys  uses  the  efficient  pushing  approach.  Upon  completion,  the  input  HSF  form  has  been  transformed  to  fit  the 
blank  registered  form  tmplt/hsftmplt.pct  and  its  spatial  field  template  tmplt/hsftmplt.pts. 

9. 1.2.2  REMOVE  FORM;  sic/tib/hsf/rmform.c;  remove_formO 

One  approach  to  isolating  the  handprint  entered  on  a form  is  to  first  remove  the  pixels  comprising  the  form 
itself.  Then,  all  that  remains  in  the  image  is  handprinted  data  in  the  presence  of  some  amount  of  noise.  Upon  registra- 
tion, the  pixels  making  up  the  form  in  the  input  image  are  known  to  correspond  to  the  pixels  in  tmpWhsftmpltpct. 
Therefore,  the  image  in  tmpltlhsftmplt.pct  cm  be  used  as  a mask  so  that,  when  laid  over  the  registered  input  image, 
each  pixel  corresponding  to  a black  pixel  in  the  mask  is  erased  from  the  input  image. 

9. 1.2.2. 1 Read  Form  Mask  Image;  src/tib/imageyteadrast.c;  ReadBinaryRasterO 

The  LSQ  method  for  form  registration  minimizes  error,  but  does  not  absolutely  remove  all  error.  Detection 
of  a registration  mark  even  within  an  undistorted  input  image  may  be  somewhat  inaccurate  and  there  is  always  a certain 
amount  of  discrete  round-off  error  when  implementing  pixel-based  transformations.  Therefore,  there  will  always  be  a 
small  amount  of  discrepancy  between  a registered  input  image  and  the  ideal  mask.  To  compensate  for  these  small 
amounts  of  error,  the  blank  registered  form  image  tmpltlhsftmplt.pct  has  been  dilated four  times  and  stored  in  tmpltl 
hsftmplt.dd.  This  broadens  all  form  structures  in  the  blank  form  image  so  that  coverage  is  improved  when  overlaid  with 
the  registered  input  image.  The  file  tmpltl hsftmplt.dd  is  a binary  IHead  image  and  is  loaded  into  hsfsys  using  the  same 
routine  ReadBinaryRaster{ ) as  is  used  to  load  the  input  HSF  form  image. 

9.1. 2.2.2  Subtract  Form  Pixels;  tib/image/binlogop.c;  nandbinimageO 

The  form  is  erased  from  the  registered  input  image  by  applying  the  dilated  blank  form  as  a mask.  A logical 
NAND  (NOT  followed  by  an  AND)  is  used.  For  each  pixel  in  the  input  image,  an  output  pixel  value  is  computed  as 
follows; 

0 = r & (~m)  (9) 

where  o is  the  output  pixel,  r is  the  pixel  from  the  registered  input  image,  and  m is  the  corresponding  pixel  fi'om  the 
mask.  In  this  way,  o is  set  to  black  only  when  r is  black  and  m is  white.  Remember  that  a black  pixel  has  value  1 and 
a white  pixel  has  value  0. 

Upon  image  subtraction,  characters  in  the  registered  input  image  may  be  left  with  holes  and  discontinuities. 
This  occurs  when  the  characters  written  in  a field  overlap  with  information  already  printed  on  the  form  or  when  strokes 
of  characters  extend  across  the  form’s  tines  or  instructions.  At  the  time  of  this  software  release,  NIST  has  not  yet  devel- 
oped a complete  solution  to  reconstructing  disjoint  strokes  and  holes  in  characters.  However,  initial  experiments  have 
been  ccmducted  to  study  this  issue,  and  further  research  is  required. 


9,2  DO  DIGIT  FIELDS;  src/lib/hsf/field.c;  do_digit_fields() 

This  section  describes  how  fields  containing  handprinted  digits  are  processed  by  the  standard  reference  rec- 
ognition system.  First,  information  must  be  loaded  into  the  system  to  support  feature  extraction  and  the  recognition  of 
handprinted  digit  images.  The  handprint  within  a particular  field  is  then  extracted,  segmented,  size-normalized,  and 
slant-normalized.  Features  are  extracted  from  each  segmented  character  image,  and  the  features  are  classified.  The 
results  from  the  classification  are  stored  field  by  field  and  include  both  the  hypothesized  digit  identifications  and  their 


25 


associated  confidence  values.  Figure  15  lists  the  steps  used  to  process  digit  fields,  and  the  figure  cross-references  the 
steps  to  the  software  distribution  according  to  file  and  subroutine  name. 


92.1  INITIALIZE  FOR  FIELDS;  src/lib/hsf/field.c;  init_field() 

Three  files  are  necessary  to  process  fields  that  contain  handprinted  digits.  The  first  file  contains  a set  of  basis 
functions  that  are  used  to  compute  feature  coefficients  from  each  segmented  digit  image.  H^sys  uses  the  Karhunen 
Loeve  (KL)  transform  to  compute  these  features.^  KL  basis  functions  have  been  computed  off-line  and  stored  in  the 
file  weights/ td3_d.evt.  The  second  file  needed  to  process  digit  fields,  weights/tdS _d.pat,  contains  prototype  KL  feature 
vectors  and  a search  tree  used  by  an  optimized  Probabilistic  Neural  Network  Classifier  (PNN).^  The  third  file,  weights! 
td3_d.med,  contains  class-based  median  vectors  computed  firom  the  prototypes  in  the  pat  file.  If  the  small  memory 
mode  (the  -m  option),  is  used  to  invoke  hsfsys,  a smaller  set  of  prototypes  and  their  associated  files  are  loaded  instead. 
These  files  begin  with  the  root  file  name  td3_d_s  in  the  top-level  directory  weights.  This  section  describes  how  basis 
functions,  prototype  vectors,  and  median  vectors  are  computed  and  how  they  are  stored  in  their  respective  files. 

9.2.1. 1 READ  BASIS  FUNCTIONS;  src/lib/nn/basis_io.c;  read_basisO 

The  KL  transform  of  a segmented  character  image  is  obtained  by  projecting  the  image  onto  the  orthonormal 
eigenvectors  of  the  covariance  matrix  of  a large  number  of  prototype  images.  The  prototype  images  are  representative 
of  the  types  of  images  desired  to  be  recognized  by  the  recognition  system,  in  this  case,  images  of  segmented  digits. 
The  KL  transform  requires  computing  the  covariance  matrix,  and  then  diagonalizing  it  to  produce  the  eigenvectors.^^ 
The  resulting  eigenvectors  can  be  used  as  basis  functions  for  feature  extraction.  Computing  the  KL  transform  is  very 
expensive,  but  it  is  done  once  off-line,  and  the  eigenvectors  are  stored  in  a basis  function  file  for  use  in  the  reco^tion 
system.  Appendix  A documents  the  program  mis2evt  that  computes  the  covariance  matrix  and  eigenvectors  • i 
training  set  of  segmented  and  labeled  character  images.  The  output  from  this  program  is evt  file. 

All  elements  of  the  basis  function  file  occupy  4 bytes  and  are  read- writable  using  the  unformatted  bL^-.y  C 
functions /read  and  fwrite.  The  supplied  basis  function  files  found  in  the  top-level  distribution  directory  weights  are 
assigned  the  extension  evt.  These  files  were  written  usmg  C source  code  running  on  a computer  that  uses  the  Motorola 
(high-low)  byte  order  format.  Users  of  other  computer  architectures  should  be  aware  that  byte  orders  may  need  to  be 
changed  for  correct  reading  on  their  specific  equipment. 

The  basis  functions  for  the  KL  transform  are  eigenvectors,  so  these  terms  are  used  interchangeably  in  this 
document  The  first  element  in  the  file  is  the  integer  number,  n,  representing  the  number  of  eigenvectors  stored  in  the 
file.  The  second  element  is  the  integer  dimensionality,  D,  of  the  eigenvectors.  The  remainder  of  the  file  consists  of  /2+2 
vectors,  each  with  D elements.  The  first  vector  of  length  D contains  the  mean  image  vector  of  all  of  the  images  used 
to  build  the  cc  dance  matrix.  The  second  vector  exists  for  historical  purposes  only  and  has  all  elements  equal  to  1.0. 
The  final  n vectors  are  the  eigenvectors  of  the  covariance  matrix,  and  they  are  stored  m the  order  of  decreasing  eigen- 
value. 


The  following  items  should  be  noted.  The  order  of  the  elements  within  the  eigenvectors  corresponds  to  row 
major  ordering  of  the  image  pixels.  The  ordering  of  the  eigenvectors  according  to  decreasing  eigenvalue  improves  the 
efficiency  of  the  PNN  classifier.  Finally,  the  images  used  in  building  the  covariance  matrix  are  32  X 32  pixels  in  size, 
resulting  in  a dimensionality  of  D = 1024,  which  is  fixed  throughout  the  implementation  of  hsfsys. 

9.2. 1.2  READ  PROTOTYPES  & TREE;  src/lib/nn/pat_io.c;  readpatstreefileO 

The  features  produced  by  projecting  segmented  character  images  onto  the  KL  eigenvectors  have  been  studied 
extensively  by  NIST.^^’^^  A set  of  KL  coefficients  are  computed  by  applying  a set  of  eigenvectors  to  the  same  image. 
The  image  is  then  represented  by  the  vector  of  coefficients  rather  than  by  its  pixel  data.  The  feature  vectors  are  com- 
puted from  a laige  training  set  di  segmented  character  images  and  can  be  used  to  train  classifiers  such  as  PNNs.  A large 
number  of  prototypes,  tens  of  thousands,  are  required  to  tram  these  classifiers,  so  they  are  computed  off-line  and  stored 
in  a prototype  file. 
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Figure  15.  Steps  to  pr(x:ess  digit  fields. 
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3.2.3  STORE  FIELD  RESULTS  3.2.3  src/lib/fet/updatfet.c;  updatefetO 

End  Loop  End  Loop 

3.2.4  DEALLOCATE  FOR  FIELDS  3.2.4  src/lib/hsf/fleld.c;  free_field() 


Like  the  basis  function  files,  prototype  files  are  read-writable  using  the  unformatted  C functions  fread  and 
fwrite.  The  supplied  prototype  files  found  in  the  top-level  distribution  directory  weights  are  assigned  the  extension  pat. 
These  files  were  written  using  C source  code  running  on  a computer  that  uses  the  Motorola  byte  order  format.  Just  as 
with  basis  function  files,  users  of  other  computer  architectures  should  be  aware  that  byte  orders  may  need  to  be  changed 
for  correct  reading  cm  their  specific  equipment.  The  prototype  file  format  was  originally  implemented  in  FORTRAN, 
which  embeds  integer  stream  lengths  at  the  beginning  and  end  cf  a byte  block  of  data.  Therefore,  the  low-level  reading 
routines  in  C are  complicated  by  this  feature. 

The  first  element  in  a prototype  file  is  a 4-byte  integer  always  having  a value  of  24.  It  indicates  that  six  4-byte 
integer  follow.  These  six  integers  constitute  the  file’s  header,  and  currently  only  four  of  the  integers  are  actually  used. 
The  first  of  the  six  integer  elements,  F,  represents  the  number  of  feature  vectors  stored  in  the  file;  the  second  element 
n signifies  the  dimensionality  of  the  feature  vectors;  the  third  element  L is  the  number  of  possible  classes  to  which  the 
vectors  may  belong;  the  fourth  element  is  not  used;  the  fifth  element  indicates  the  format  used  in  the  file  and  must  be 
a value  of  5 15 1;  and  the  sixth  element  is  currently  imused.  After  the  sbc  integers,  the  initial  block  size  integer  with 
value  24  is  repeated. 

The  section  following  the  header  in  a prototype  file  contains  the  class-set  that  identifies  each  class  with  a user- 
defined  string.  The  data  block  starts  with  a 4-byte  integer  assigned  the  value  32  x L.  The  class  set  strings  follow  widi 
L strings  each  of  length  32  bytes  (they  do  not  need  to  be  null  terminated).  The  same  integer  data  length  of  32  x L con- 
cludes the  class-set  section. 

The  largest  section  of  the  file  follows  the  class  set  and  contains  KL  feature  vectors.  The  n elements  of  one 
vector  are  followed  by  the  next  for  a total  of  P vectors.  These  floating  point  feature  vectors  are  most  conveniently  input 
using  a single  fread  of  4 x F x « bytes  into  a preallocated  block  of  contiguous  memory.  The  feature  vectors  are  followed 
by  a vector  of  integer  indices  on  the  range  [OX-1].  These  indices  identify  the  class  of  each  feature  vector  stared  in  the 
file  and  can  be  read  as  a single  block  of  4 x F bytes. 

The  final  section  in  a prototype  file  contains  a search  tree  that  is  used  to  minimize  the  computational  intensity 
of  a traditional  PNN  classifier.  This  tree  contains  indices  pointing  to  the  feature  vectors  stored  in  the  file.  Therefore, 
the  ordering  of  these  features  is  important  and  must  remain  fixed.  For  this  reason,  the  tree  is  included  in  the  prototype 
file.  The  tree  section  begins  with  two  4-byte  integers.  The  first  integer  is  the  number  of  nodes,  A in  the  free.  The  second 
integer  is  the  number  of  children  per  node,  C.  A matrix  containing  NxC  4-byte  integers  follows.  The  matrix  is  fol- 
lowed by  five  vectors  each  containing  N 4-byte  elements.  The  first  three  vectors  contain  integers,  while  the  last  two 
hold  floating  point  values.  Section  9.2.2.6  will  discuss  the  use  of  this  tree  in  more  detail,  and  Appendix  B documents 
the  program  mis2pat  that  generates  prototype  files. 

9.2. 1.3  READ  MEDIAN  VECTORS;  src/hbAm/median_io.c;  readmedianfileO 

The  program  mis2pat  produces  a second  file  that  is  needed  for  character  classification.  Median  vector  files  are 
supplied  in  the  top-level  distribution  directory  weights  and  are  assigned  the  extension  med.  A median  file  contains  as 
many  vectors  as  there  are  classes  in  an  application.  For  example,  there  are  10  classes  when  recognizing  digits  as  com- 
pared to  26  classes  when  recognizing  upper  case  letters.  Each  median  vector  has  the  same  dimensionality  D as  the  fea- 
ture vectors  stored  in  a corresponding  prototype  file.  The  element  of  the  i^^  vector  contains  the  center  value  from 
the  sorted  fist  of  the  elements  from  all  the  training  feature  vectors  for  class  i.  The  median  vectors  are  used  during 
classification  to  initialize  the  search  through  the  tree  stored  in  a corresponding  prototype  file. 

The  median  vectors  are  stored  in  an  ASCII  file  format.  The  first  line  of  the  file  contains  two  space-separated 
integers.  The  first  integer  specifies  the  number  of  vectors  in  the  file,  die  second  integer  specifies  the  number  of  elements 
in  each  vector.  The  floating  point  vectors  follow,  one  after  another,  with  a blank  line  separating  each  vector.  The  order 
of  these  vectors  correspond  to  the  class  mdices  stored  in  a corresponding  prototype  file. 
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922  PROCESS  DIGIT  FIELD;  src/Ub/hsf/field.c;  process_digit_fieldO 


With  the  input  HSF  form  image  registered  and  the  form  information  removed,  the  handprint  entered  within 
each  field  can  be  extracted  using  the  spatial  field  template.  The  field  subimage  is  then  segmented,  separating  each  hand- 
printed digit  into  its  own  image.  Handprint  varies  widely  in  size  and  slant  between  different  writers,  so  each  segmented 
digit  image  is  normalized  so  that  the  character  is  scaled  to  a consistent  size,  and  the  character  is  straightened  to  remove 
slant.  Features  are  extracted  from  each  character  image  so  that  an  image  is  represented  by  a vector  of  floating  point 
coefficients  rather  than  by  its  binary  pixels.  These  feature  vectors  are  classified  by  a neural  network,  and  the  hypothe- 
sized digit  classifications  along  with  their  associated  confidence  values  are  stored. 

9.2.2. 1 ISOLATE  1 -LINE  FIELD;  srcAib/hsfAsolate.c;  iso_lline_field() 

Through  the  process  of  form  registration,  the  input  HSF  form  has  been  transformed  to  liae  up  with  the  spatial 
field  template  stored  in  tmpltlhsftmplt.pts.  This  spatial  template  defines  the  location  and  spatial  extent  of  each  entry 
field  on  the  HSF  form.  Each  field  region  is  represented  by  4 pixel  coordinate  points  representing  the  comers  of  a rect- 
angle that  is  aligned  with  the  raster  grid  in  the  input  image.  These  rectangular  template  coordinates  are  used  to  extract 
subunages  of  the  fields  from  the  registered  input  image.  Figure  16  contains  an  example  of  an  extracted  field  (scaled  up 
4X).  Notice  that  in  addition  to  the  handprint  there  is  still  a small  amount  of  form  information  that  was  not  erased  during 
form  removal.  Spatial  histograms  similar  to  those  used  in  locating  registration  marks  are  used  to  separate  the  hand- 
printed data  from  the  form  data. 

The  techniques  used  works  off  the  assumption  that  the  entry  field  contains  a single  line  of  handprinted  text, 
and  the  handprint  can  be  distinguished  from  the  edges  of  the  form  by  examining  pixel  densities  within  a spatial  histo- 
gram projection.  A horizontal  histogram  computed  on  the  example  field  image  is  displayed  in  Figure  17. 


6 


Figure  16.  Form  residue  in  an  isolated  field. 


Figure  17.  Horizontal  histogram  of  image  displayed  in  Figiue  16. 
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Figure  18.  Cropped  field  image  containing  only  handprint. 
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Figure  19.  Technique  for  locating  the  vertical  center  of  a line  of  handprinted  text. 
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Li  order  to  locate  the  handprinted  text,  the  histogram  values  plotted  in  Figure  17  are  converted  to  a vector  of 
incremental  run  length  values.  This  process  is  illustrated  in  Figure  19.  The  first  column  contains  a list  or  row  indices 
for  the  example  image.  The  second  column  of  numbers  lists  the  histogram  values  whose  positions  within  the  list  are 
called  bins.  Notice  that  the  lists  have  been  shortened  at  points  of  contiguous  zeros  so  that  they  fit  within  the  figure.  All 
those  bins  having  an  accumulated  value  of  more  than  2 pixels  is  set  to  1,  and  those  bins  having  less  than  2 pixels  is  set 
to  0.  The  resulting  binary  vector  is  fisted  in  the  third  column  of  the  figure.  A run  length  counter  is  initialized  for  each 
contiguous  group  of  binary  values  equal  to  1,  and  each  subsequent  value  of  1 iu  a nm  is  replaced  by  the  current  run 
length  counter,  and  the  counter  is  then  incremented.  The  results  of  this  step  are  shown  in  the  fourth  column.  Finally, 
the  run  with  the  longest  length  is  selected,  and  the  midpoint  of  the  run  is  determined  to  represent  the  vertical  middle 
of  the  handprinted  text  within  the  field.  In  the  example  shown  in  Figure  19,  the  second  run  with  length  38  is  selected, 
and  raster  row  59  is  determined  to  approximate  the  middle  of  the  handprinted  text. 

Given  the  approximate  middle  of  the  handprinted  text  fine,  the  histogram  bins  in  the  second  column  of  Figure 
19  are  searched  to  locate  the  edges  of  the  text  One  search  starts  at  the  approximate  middle  and  proceeds  upwards  until 
a bin  equal  to  0 is  encountered,  and  in  a similar  way  a second  search  starts  at  the  approximate  middle  and  proceeds 
downwards  irntfi  a bin  equal  to  0 is  encountered.  The  two  points  at  which  the  bins  become  zero  are  assumed  to  corre- 
spond to  the  top  and  bottom  edges  of  the  handprinted  text  line,  hnage  rows  between  these  two  points  are  extracted,  and 
any  form  residue  is  cropped. 

The  one-fine  fields  on  the  HSF  from  are  much  wider  than  they  are  tall,  so  form  residue  is  much  more  common 
along  the  horizontal  sides  of  these  fields  than  along  the  vertical  sides.  The  longer  horizontal  sides  permit  small  amounts 
of  registration  error  to  propagate  until  it  becomes  significant,  whereas  the  error  along  the  shorter  vertical  sides  is  sel- 
dom propagated  to  the  extent  that  it  is  noticeable.  Therefore,  the  process  of  thresholding  the  histogram  bins  and  com- 
puting run  length  increments  is  done  cmly  for  locating  the  top  and  bottom  edges  of  the  handprint  within  the  field.  Left 
and  right  edges  are  found  by  searching  vertical  histogram  bins  directly.  Searching  in  from  a left  or  right  edge,  the  first 
histogram  bin  greater  than  10  pixels  is  found,  and  then  a reverse  search  from  that  point  locates  the  first  bin  that  equals 
zero.  The  use  of  a 10  pixel  threshold  avoids  speckle  noise  in  the  field  and  locates  the  beginning  of  significant  character 
data,  while  the  reverse  search  locates  the  edge  of  the  character  data.  Hsfsys  extracts  the  subimage  bounded  by  these 
left,  right,  top,  and  bottom  edges,  and  the  result  of  cropping  the  image  in  Figure  16  is  shown  in  Figure  18. 

9.2.2.2  SEGMENT  DIGIT  FELD;  src/fib/hsf/segblob.c;  segbinblobdigitsO 

At  this  point,  hrfsys  has  a subimage  containing  the  handprint  of  one  or  more  digits.  The  feature  extraction  and 
classification  techniques  used  by  the  recognition  system  are  designed  to  classify  images  containing  a single  character. 
Therefore,  the  field  image  of  multiple  characters  must  be  segmented  into  plausible  character  images,  one  character  per 
image.  To  do  this,  the  system  uses  connected  components  or  blobs  to  define  these  plausible  character  images.  A blob 
is  defined  to  be  a group  of  pixels  all  contiguously  neighboring  or  connecting  each  other.  In  general,  each  blob  is 
extracted  and  assumed  to  be  a separate  character,  although  a blob  is  not  guaranteed  to  be  a single  and  complete  char- 
acter. This  is  frequently  the  case  with  handprinted  fives.  Often  a writer  will  print  the  top  horizontal  stroke  of  a 5 so  that 
it  does  not  connect  the  bottom  portion  of  the  digit.  In  this  case,  the  two  pieces  of  the  same  five  will  be  treated  incorrectly 
as  two  independent  characters.  To  avoid  this  type  of  problem,  a blob  pasting  step  has  been  developed. 

Based  on  experience  gained  from  creating  and  manipulating  large  on-line  image  databases  such  as  SD3.  NIST 
has  developed  a number  of  diversified  structures  and  file  formats.  Storing  character  images  in  individual  files  has 
proven  to  be  very  inefficient,  especially  when  manipulating  databases  containing  hundreds  of  thousands  of  characters. 
Devoting  a separate  file  node  for  each  character  image  creates  enormous  file  system  overhead,  and  umeasonably  large 
directory  tables  must  be  allocated.  Rarely  do  recognition  system  components  process  only  a single  character  image  in 
isolation.  Rather,  most  components  are  designed  to  process  a large  sample  of  characters.  Experience  has  shown  that 
the  gathering  of  a large  sample  of  characters  from  a file  system  where  the  images  have  been  stored  in  individual  files 
greatly  burdens  the  computer’s  disk  controller.  This  results  in  slow  experiment  loading  times  as  well  as  limiting  the 
access  of  other  applications  to  data  stored  on  the  same  storage  device. 

In  addition  to  creating  large  directory  tables,  storing  segmented  character  images  in  individual  files  results  in 
sparse  usage  of  the  storage  device.  This  sparseness  is  even  more  exaggerated  when  the  images  are  compressed.  For 
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example,  segmented  character  images  in  SD3  have  been  centered  within  a 128  by  128  binary  pixel  image.  The  resulting 
image  size  is  2,344  bytes,  296  bytes  for  the  IHead  header  and  2,048  bytes  of  image  data.  These  files  when  CCll  1 
Group  4 compressed  average  360  bytes  in  size,  296  bytes  for  the  IHead  header  and  only  64  bytes  of  compressed  image 
data.  Storing  these  compressed  image  files  onto  QD-ROM  for  example,  which  uses  a 2,048  byte  block  size,  would  be 
extremely  wasteful.  Only  18%  of  each  block  containing  image  data  would  be  used. 

In  light  of  these  observations  the  segmentor  used  by  hsfsys  creates  a memory  structure  called  a Multiple  Image 
Set  (MIS).  The  1>^S  structure  and  file  format  have  been  developed  to  manage  large  volumes  of  segmented  character 
images.  The  .ormat  allows  multiple  images  of  homogeneous  dimensions  and  depth  to  be  stored  in  one  file.  MIS 

is  a simple  extension  or  encapsulation  of  the  IHead  format  described  in  Section  9. 1 . 1 . 1 . It  can  be  seen  in  Figure  20  that 
the  IHead  structure  is  included  as  a member  within  the  MIS  definition.  The  library  srcilibimis  contains  a suite  of  rou- 
tines designed  to  read  and  write  MIS  files  and  manipulate  MIS  structures. 

An  MIS  file  contains  one  or  more  individual  images  stacked  vertically  within  the  same  contiguous  raster 
memory,  the  last  pixel  row  or  scanline  of  the  previous  image  is  concatenated  to  first  scanline  of  the  next  image.  The 
individual  images  that  are  concatenated  together  are  referred  to  as  MIS  entries.  The  resulting  contiguous  raster  mem- 
ory is  referred  to  as  the  MIS  memory.  The  MIS  memory  containmg  one  or  more  entries  of  uniform  width,  height,  and 
depth  is  stored  using  the  IHead  file  format  The  IHead  attribute  fields  are  sufficient  to  describe  the  ^ ’’S  memory.  The 
IHead  structure’s  width  attiibute  specifies  the  width  of  the  MIS  memory,  and  likewise  the  IHeao  ..  acture’s  height 
attribute  specifies  the  height  of  the  MIS  memory.  In  this  way,  the  MIS  memory  can  be  stored  just  like  any  normal  IHead 
image,  including  possible  compression.  An  example  of  an  MIS  memory  is  displayed  in  Figure  21  (scaled  up  3X).  In 
this  example,  each  extracted  character  is  centered  within  a 128  by  128  NflS  entry. 

Due  to  the  uniform  dimensions  of  MIS  entries,  the  IHead  structure’s  width  attribute  also  specifies  the  width 
of  the  entries  in  the  MIS  memory.  What  is  lacking  from  the  original  IHead  definition  is  the  uniform  height  of  the  MIS 
entries  and  the  number  of  MIS  entries  in  the  MIS  memory.  Notice  that,  given  the  uniform  height  of  the  MIS  entries, 
the  number  of  entries  in  the  MS  memory  can  be  computed  by  dividing  the  entry  height  into  the  total  MIS  memory 
height  The  interpretation  of  two  of  the  IHead  attribute  fields,  par_x  and  par_y,  changes  when  the  IHead  header  is 
being  u^  to  describe  an  MIS  memory.  The  par_x  field  is  used  to  hold  the  uniform  width  of  the  MIS  entries,  and  the 
par_y  field  is  used  to  hold  the  uniform  height  of  the  MIS  entries.  In  other  words,  width  and  height  represent  MIS 
memory  width  and  MIS  memory  height  respectively,  while  par_x  and  par_y  represent  MIS  entry  width  and  MIS  entry 
height  respectively.  Using  this  convention,  an  MIS  file  is  treated  like  an  IHead  file. 


Filename:  MisJi 
Author:  Michael  D.  Garris 
Date:  7/18/90 


typedef  struct  misstruct{ 
IHEAD  *head; 
unsigned  char  *data; 
int  misw; 
int  mish; 
intentw; 
int  enth; 
int  ent_num; 
int  ent_alloc; 

}MIS; 


Figure  20.  C definition  for  the  MIS  structure. 
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Figure  21.  An  example  of  an  MIS  memory  segmented  from  the  field  image  in  Figure  18. 

Figure  20  lists  the  MIS  structure  definition  written  in  C and  found  in  includelmis.h.  The  structure  contains  an 
EHead  structure,  head,  and  an  MIS  memory,  data.  In  addition,  there  are  six  other  attribute  fields  that  hide  the  details 
of  the  IHead  interpretation  from  application  programs  that  manipulate  MIS  memories.  The  MIS  attributes  misw  and 
mish  specify  the  width  and  height  of  the  MIS  memory.  These  values  are  the  same  as  the  width  and  height  attributes 
contained  in  the  IHead  structure  pointed  to  by  head.  The  MIS  attributes  entw  and  enth  specify  the  uniform  width  and 
height  of  the  MIS  entries.  These  values  are  the  same  as  the  par_x  and  par_y  attributes  contained  in  the  IHead  structure 
pointed  to  by  head.  The  MIS  attribute,  ent_aIloc,  specifies  how  many  MIS  entries  of  dimension  entw  and  enth  have 
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been  allocated  to  the  MIS  memory  data.  The  MIS  attribute,  ent_num,  specifies  how  many  entries  out  of  the  possible 
number  allocated  are  currendy  and  contiguously  contained  in  the  MIS  memory  data. 

In  addition  to  extracting  character  images,  the  segjnentor  used  by  hsfsys  computes  the  location  of  where  each 
character  was  extracted  form  the  field  image,  and  also  computes  the  characters  width  and  height.  These  statistics  are 
listed  with  the  MIS  memory  in  Figure  21. 

9.2.2.2.1  Extract  Blobs;  src/lib/hsf/segblob.c;  segbinblobO 

A serial  implementation  of  a connected  component  algorithm  caliedfindblob( ) has  been  developed  that  is  rel- 
atively inexpensive  to  compute  and  is  included  in  this  software  distribution  in  the  file  srciliblimagelfindblob.c.  This 
utility  finds  a single  blob  from  the  input  image  and  returns  a subimage  contairdng  die  blob.  By  calling  the  utility  repeat- 
edly, one  can  obtain  all  the  blobs  in  the  input  image,  or  if  desired,  just  some  of  the  blobs.  Each  call  to findblob( ) initiates 
a search  that  begins  at  a specified  starting  point  in  the  input  image  and  proceeds  to  scan  the  input  image  in  column- 
major  (top-to-bottom  and  left-to-right)  order  for  a black  pixel.  Once  found,  the  black  pixel  is  grown  into  a complete 
blob  region.  The  caller  may  leave  the  current  point  of  the  scan  unchanged  between  calls,  thereby  makmg  a complete 
scan  that  finds  all  blobs  upon  subsequent  calls,  or  the  caller  may  change  the  starting  point  so  as  to  direct  the  search  to 
specific  regions  within  the  input  image. 

Findblob( ) is  extremely  flexible  and  has  been  designed  with  a number  of  different  options.  These  options  con- 
trol the  clearing  of  blobs  from  the  input  image,  the  allocation  of  memory  for  the  output  image,  and  the  format  of  the 
outi)ut  image.  Liput  image  pixels  that  are  members  of  a detected  blob  can  either  be  left  alone  or  erased.  The  caller  can 
either  provide  the  space  needed  for  the  output  image  or  let  die  utility  allocate  the  required  amount  of  memory.  Finally, 
there  are  three  available  output  format  options.  In  the  first  format,  the  blob  is  returned  in  an  output  image  the  same  size 
and  dimension  as  the  input  image.  In  this  case,  the  blob  occupies  the  same  position  in  the  full-size  output  image  as  it 
did  in  the  input  image.  The  second  format  returns  the  blob  centered  in  an  image  whose  dimensions  are  specified  by  the 
caller.  The  final  format  option  causes  the  blob  to  be  returned  in  an  image  allocated  just  large  enough  to  contain  it.  In 
other  words,  the  output  image  is  defined  to  be  the  bounding  box  around  the  blob. 

Starting  at  a specified  pixel  position,  the  utility  scans  the  input  image  for  a black  pixel.  When  a black  pixel  is 
found,  it  is  grown  into  a run.  Here,  a run  is  a maximal  horizontal  segment  of  contiguous  black  pixels.  The  run  is  then 
grown  into  a maximal  set  of  coimected  runs,  which  constitutes  an  entire  blob.  During  the  growth  process,  bytes  repre- 
senting pixels  of  the  blob  are  changed  from  ones  to  zeros,  and  structures  representing  the  runs  are  stored  in  an  array. 
The  pixels  must  be  changed  to  avoid  finding  them  again.  (If  the  caller  opts  not  to  have  these  pixels  changed  upon  return, 
then  they  are  set  back  just  prior  to  exiting  the  utility.) 

The  array  of  runs  is  treated  as  a queue.  One  growth  step  consists  of  reading  a run  from  the  head  of  the  queue, 
producing  new  runs  if  there  are  any  black  pixels  connected  to  its  top  or  bottom,  and  appending  these  new  runs  to  the 
queue  tad.  The  queue  is  initialized  to  just  the  seed  run  that  is  grown  from  the  position  of  the  first  black  pixel  encoun- 
tered during  the  column-major  scan.  The  growtii  steps  continue  until  the  queue  becomes  empty.  The  tail  of  the  queue 
does  not  wrap  around  so  as  to  recycle  array  positions  (as  is  typical  with  most  queue  implementations).  Instead,  head 
and  tail  pomters  both  move  toward  higher  addresses,  so  that  when  the  growth  is  finished,  the  array  contains  all  elements 
that  have  ever  been  in  the  queue.  The  routine  then  systematically  goes  through  all  the  run  structures  and  sets  their  cor- 
responding pixels  to  black  in  the  ouq)ut  image.  The  output  image,  representing  one  blob,  is  then  returned  to  the  caller. 

The  routine  is  efficient  because  it  localizes  processing  to  only  the  black  pixels  in  the  image,  and  it  does  so  one 
blob  at  a time.  In  addition,  the  algorithm’s  implementation  generally  requires  an  amount  of  working  memory  that  is 
small  compared  to  the  memory  occupied  by  the  input  image.  The  blobs  returned  by  this  utility  are  treated  as  plausible 
character  images.  If  a blob  is  too  small  it  is  thrown  away  and  ignored.  If  a blob  is  too  big  it  currently  is  also  thrown 
away.  A future  refinement  to  this  segmentor  would  be  to  try  to  break  any  large  blob  down  into  smaller  pieces  because 
it  is  likely  to  contain  multiple  characters  coimected  to  each  other.  Ei  hrfsys,  if  a blob  has  less  than  100  black  pixels  it 
is  considered  too  small,  and  if  a blob  has  more  than  1750  black  pixels  it  is  cmsidered  too  big. 
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922.2.2  Paste  Digit  Blobs;  src/lib/hsf/segblob.c;  paste_digit_blobs() 

Unfortunately,  using  connected  components  for  segmentation  has  some  significant  pitfalls.  A blob  is  not  guar- 
anteed to  be  a single  and  complete  character.  If  two  characters  touch,  then  a single  blob  will  contain  both  characters  as 
a single  composite  image.  A blob  may  also  contain  only  one  stroke  of  a character  that  is  comprised  of  several  disjoint 
pieces.  For  example,  the  top  of  the  letter  T may  not  be  connected  to  the  vertical  stroke,  causing  the  algorithm  to  over- 
segment the  character  into  two  blobs. 

Figure  22  shows  a field  containing  “DAuGhter”  in  which  connected  component  labeling  over-segments  and 
under-segments  the  field.  The  extracted  field  image  is  shown  at  the  top,  and  the  resulting  blobs  are  listed  below  it  The 
first  blob  is  a vertical  stroke  that  when  viewed  independently  looks  like  a 1,  /,  or  I.  This  blob  is  the  vertical  stroke  rep- 
resenting the  left  potion  of  the  first  letter  D.  This  is  an  example  of  over-segmenting.  The  remaining  three  blobs  are 
examples  of  under-segmenting.  The  second  blob  contains  portions  of  D,  A,  and  u.  In  this  example,  the  single  blob  is 
assigned  a class  of  X by  the  recognition  system’s  character  classifier  because  the  blob  is  assmned  to  be  a single  char- 
acter. The  third  blob  contains  both  the  G and  h and  is  assigned  a class  of  G.  The  h is  deleted  from  the  field.  The  fourth 
blob  contains  t,  e,  and  a portion  of  a clipped  r.  This  blob  is  incorrectly  assigned  a class  of  W.  Due  to  segmentations 
errors  introduced  by  using  connected  components,  the  field  is  recognized  as  “HXGW”  rather  than  “DAUGHTER”. 


\ 

; 

Figure  22.  An  example  of  over  and  under-segmenting  using  connected  components. 

The  problem  of  over-segmentation  does  occur  when  using  connected  components  to  segment  digits.  For 
example,  a writer  will  often  print  the  top  horizontal  stroke  of  a 5 so  that  it  does  not  connect  to  the  bottom  portion  of 
the  digit.  The  two  pieces  of  the  same  5 wiQ  be  treated  incorrectly  as  two  independent  characters.  To  avoid  this  type  of 
problem,  a method  of  blob  pasting  has  been  developed. 

The  connected  component  utility  extracts  blobs  in  a column-major  order,  so  blobs  are  extracted  left-to-right 
within  a one-line  text  field.  In  addition  to  the  blob  images,  the  utihty  returns  the  location  from  where  the  blob  was 
extracted  in  the  field  along  with  the  blob’s  width  and  height  approximated  by  a bounding  rectangle.  Examples  of  these 
statistics  are  listed  m Figure  21.  At  times  it  becomes  necessary  to  join  two  blobs  together  as  with  the  top  and  bottom 
pieces  of  a disjoint  five.  The  decision  to  join  two  blobs  is  based  on  a simple  heuristic  that  tests  neighboring  blobs.  The 
heuristic  tests  the  current  blob  with  it  neighbor,  the  next  blob.  If  the  difference  between  the  next  blob’s  bottom  minus 
the  current  blob’s  top  is  less  than  half  the  current  blob’s  height,  then  the  two  blobs  are  pasted  back  together  as  a new 
plausible  character  image.  This  simple  heuristic  works  very  well  at  putting  the  tops  back  on  disjoint  fives. 

9.2.2.3  NORMALIZE  CHARACTER  IMAGES;  src/lib/hsf/normaliz.c;  norm_2nd_gen() 

To  improve  the  classification  performance  of  character  images,  a size-normalization  technique  referred  to  as 
second  generation  normalization  was  developed.  Handprinted  characters  come  in  aU  different  shapes,  sizes,  and  slants. 
Some  people  will  use  aU  the  space  provided  on  a form,  and  others  will  use  as  little  space  as  possible.  It  has  been 
observed  that  the  same  characters  printed  by  the  same  writer  can  vary  greatly  in  size.  As  a writer  begins  to  run  out  of 
room,  he  will  do  all  types  of  curious  things  to  fit  the  remainder  of  his  answer  in  the  field. 

Second  generation  ncraialization  attempts  to  remove  size  variations  in  handprint  by  scaling  all  segmented 
character  images  to  a consistent  size.  The  scaling  is  handled  by  an  efficient  serial  zoom( } utility  provided  with  the  soft- 
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ware  distribution  in  srciliblimageizoom.c.  The  normalization  method  bounds  the  character  data  within  a segmented 
image  with  a box,  and  that  box  is  scaled  to  fit  exactiy  within  a 20  by  32  pixel  region,  and  the  aspect  ratio  of  the  original 
character  is  not  preserved.  The  resulting  20  by  32  pixel  character  is  then  centered  within  a 32  by  32  pixel  image. 

Tests  at  NIST  have  shown  that  size-normalization  improves  recognition  performance  when  recognizing 
handprinted  characters.  The  technique  also  applies  a simple  morphological  operator  in  an  attempt  to  normalize  the 
stroke  width  within  the  character  image.  If  the  pixel  content  of  a character  image  is  significantly  high,  then  the  image 
is  eroded  (strokes  are  thinned).  If  the  pixel  content  of  a character  image  is  significantly  low,  then  the  image  is  dilated 
(strokes  are  widened).  The  left  image  in  Figure  23  shows  an  original  segmented  character  (scaled  up  4X)  centered 
within  a 128  by  128  image.  The  same  character  spatially  normalized  using  second  generation  nonnahzation  is  dis- 
played in  the  right  image. 
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Figure  23.  Results  of  size-normalizing  a segmented  character  image. 

9.2.2.4  SHEAR  CHARACTER  IMAGES;  src/hb/hsf/shear.c;  shear_mis() 

As  mentioned  earlier,  not  only  does  the  shape  and  size  of  handprinted  characters  vary,  but  their  slant  can  also 
be  significantly  different.  As  size-normalization  attempts  to  reduce  character  variations  due  to  scale,  slant-normaliza- 
tion attempts  to  reduce  character  variations  due  to  slant  By  effectively  reducing  these  two  sources  (size  and  slant)  of 
variation,  a character  classifier  is  left  to  deal  primarily  with  variations  due  to  character  shape. 

The  slant  is  removed  by  a technique  that  uses  horizontal  shears  m which  rows  in  the  image  are  shffted  left  or 
right  in  an  attempt  to  straighten  the  character  in  the  image.  Given  a segmented  character  image,  the  top  and  bottom 
image  rows  containing  black  pixels  are  located.  The  leftmost  black  pixel  is  located  in  each  of  the  two  rows,  and  a linear 
shifting  function  is  calculated  to  shift  the  rows  in  the  image  so  that  when  finished  the  leftmost  pixels  in  the  top  and 
bottom  rows  line  up  in  the  same  column.  The  rows  between  the  top  and  bottom  are  shifted  in  lesser  amounts  based  on 
the  linear  shifting  function. 


A slope  factor/,  defining  the  linear  shifting  function  is  calculated: 


/ = 


■tr 


(10) 


where  / is  the  vertical  position  of  the  top  row,  bj.  is  the  vertical  position  of  the  bottom  row,  t/  is  the  horizontal  position 
of  the  leftmost  black  pixel  in  the  top  row,  and  h/  is  the  horizontal  position  of  the  leftmost  black  pixel  in  the  bottom  row. 
The  slope  factor  is  used  to  compute  a shift  coefficient  as  follow: 

s - {r-  m)f 
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with  r being  a vertical  row  index  in  the  image  and  m equal  to  the  vertical  middle  of  the  image.  This  causes  the  shifting 
to  be  centered  about  the  middle  of  the  image.  A positive  value  of  the  shift  coefficient  causes  a row  to  be  shifted  s pixel 
positions  to  the  right,  and  a negative  value  causes  a row  to  be  shifted  s pixel  positions  to  the  left. 

This  slant-normalization  technique  is  applied  after  size-normalization.  The  results  of  shearing  a size-normal- 
ized handprinted  4 in  order  to  remove  the  character’s  slant  is  shown  (scaled  up  8X)  in  Figure  24.  The  technique  is  very 
inexpensive  to  compute  and  it  works  very  well.  This  technique  occasionally  fails  to  remove  the  slant  from  the  character 
when  the  top-leftmost  black  pixel  in  the  image  and  the  bottom-leftmost  black  pixel  in  the  image  do  not  lie  on  the  same 
vertical  suoke.  This  is  more  likely  to  happen  with  characters  such  as  H and  M where  there  are  two  equally  likely  peaks 
at  the  top  and  bottom  of  the  character.  In  these  cases,  the  slant  may  not  be  removed,  although  the  results  of  the  shearing 
do  cut  down  on  character  variations,  which  is  the  underlying  goal  of  this  process. 
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Figure  24.  Slant  removed  from  a character  image  via  shearing. 


9.2.2.5  EXTRACT  FEATURES;  src/lib/nnykl_mis.c;  kl_transform_misO 

The  Karhunen  Loeve  (KL)  transform  has  many  optimal  properties  and  is  widely  used  in  the  pattern  recogni- 
tion field.  The  KL  transform  is  a linear  transform  and  corresponds  to  the  projection  of  images  onto  the  eigenvectors 
of  a covariance  matrix,  where  the  covariance  matrix  is  created  from  images  of  training  data  such  as  those  distributed 
with  SD3  and  in  the  top-level  directory  train.  The  producticm  of  this  transform  is  also  known  as  principal  factor  or 
principal  components  analysis.  The  creation  of  the  covariance  matrix  and  its  eigenvectors  is  conducted  off-line,  and 
the  computed  eigenvectors  are  stored  in  a basis  function  file  described  in  Section  9.2. 1.1. 


The  pixels  of  a segmented  character  image  define  a vector  whose  elements  are  obtained  by  considering  the  2- 
dimensional  NbyN  image  as  a vector  of  elements.  This  vector  is  formed  by  concatenating  the  rows  of  the  image 
together,  and  each  binary  element  is  converted  according  to  Equation  13.  Black  pixels  are  represented  as  1 and  white 
pixels  are  represented  as  -1.  The  segmented  character  images  have  been  size-normalized  to  be  32  by  32  pixels;  there- 
fore, A = 32  and  = 1024. 


U - •••’  ^IN’  ^21’  •••’  ^2N’  •••’ 

1 if  black  pixel 

“y  = { . 

-Iff  white  pixel 

The  mean  vector  in  Equation  (14)  is  computed  from  all  the  training  images. 


Z = 1 


(12) 

(13) 


(14) 


This  mean  vector  is  subtracted  from  all  the  training  images  forming  a set  of  sample  vectors.  Each  sample  vector  com- 
prises a column  in  the  sample  matrix,  U.  The  covariance  matrix  R is  symmetric  and  is  formed  as  the  outer  product  of 
the  P sample  vectors  as  in  Equation  (15). 
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R = 


(15) 


The  covariance  matrix  is  diagonalized  using  standard  FORTRAN  linear  algebra  routines  such  as  those  in  EISPACK^, 
producing  the  eigenvalues  and  corresponding  eigenvectors  in  descending  order  of  largest  eigenvalue.  The  covariance 
matrix  R has  eigenvectors  as  the  columns  of  'F  defined  in  the  equafion 


R'P  = 'FA  (16) 

and  the  only  nonzero  elements  of  A are  the  eigenvalues  on  its  diagonal.  The  KL  transform  v of  a vectca:  u is  the  pro- 
jection of  the  vector  minus  the  mean  vector  onto  the  eigenvector  basis  W . 

v = 'F^(u-li)  (17) 

Typically,  only  a subset  of  the  eigenvectors  conespondkig  to  the  largest  eigenvalues  are  used  in  the  transfor- 
mation. The  initial  dimensionality  of  u is  By  selecting  only  the  top  k eigenvectors,  the  dimensionality  of  the  trans- 
formed feature  vector  v is  reduced  to  k.  In  hsfsys,  k is  selected  to  be  64.  For  a more  complete  discussion  of  the  effect 
of  feature  dimensionality  please  refer  to  Reference  22. 

Several  steps  have  been  taken  to  increase  the  efficiency  of  the  KL  transform  when  applied  to  binary  images. 
The  first  improvement  is  a pre-multiplication  step  in  which  certain  factors  that  are  not  dependent  on  the  image  data  are 
computed  once  up  front,  then  these  factors  are  reused  over  a set  of  segmented  character  images.  The  second  optimiza- 
tion takes  advantage  of  the  binary  nature  of  the  image  data.  Tfre  details  of  the  implementation  can  be  examined  in  the 
source  code  file  srclliblnnlkl.c. 


9.22.6  (XASSIFY  FEATURE  VECTORS;  src/lib/nn^nn.c;  treepnnhypsconsCO 

It  has  been  our  experience  that  Probabilistic  Neural  Networks  (PNNs),  ouq)erform  Multi-Layer  Peiceptrons 
(MLPs)  and  other  popular  classifiers  in  terms  of  accuracy.^  The  PNN  algorithm,  in  its  traditional  implementation  ^ 
takes  a large  training  set  of  prototype  vectors  and  uses  euclidean  distances  from  an  unknown  vector  to  each  of  the  train- 
ing vecti. : These  distances  are  computed  each  time  an  unknown  vector  is  classified.  Similar  methods  are  iised  in  k- 
nearest  ne  ghbor  classifiers.  This  computation  is  very  expensive,  so  up  till  now,  the  slow  processing  times  incurred  by 
software  implementations  of  PNN  have  outweighed  the  accuracy  benefits  of  the  classification. 

Hsfsys  uses  a new  optimized  version  of  PNN  that  was  developed  at  MST.  This  new  software  implementation 
achieves  a factor  of  20  improvement  in  processing  time  over  the  traditional  PNN  when  running  in  the  standard  refer- 
ence recognition  system,  and  the  speed  improvement  is  realized  without  any  loss  in  classification  accuracy. 

In  the  traditional  PNN,  each  training  vector  (or  prototype)  xj  becomes  the  center  of  a kernel  function  that  takes 
its  maximum  at  the  vector  and  decreases  gradually  as  one  moves  away  from  the  vector  in  feature  space.  An  unknown 
feature  vector  y is  classified  by  computing,  for  each  class  i containing  Afj  prototype  vectors,  the  sum  of  the  values  of 
the  class-/  kernels  at  y,  multiplying  these  sums  by  factors  involving  the  estimated  a priori  probabilities,  and  finding 
which  of  L classes  has  the  highest  resulting  discriminant  value.  PNN  assigns  the  class  with  the  highest  discriminant 
value  to  the  unknown  vector  y. 


Many  forms  are  possible  for  the  kernel  functions;  we  have  obtained  our  best  results  using  radially  symmetric 
Gaussian  kernels.  The  resulting  discriminant  functions  are  of  the  form: 
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where  a is  a smoothing  parameter  that  may  be  optimized  by  conducting  experiments  on  a testing  set.  In  this  study,  a 
is  assigned  2.0  when  classifying  digits  and  3.0  when  classifying  alphabetic  characters.  The  a priori  probability  of  class 
i is  /?(/),  and  as  mentioned  earlier,  M,  is  the  number  of  training  prototypes  in  class  /.  The  euclidean  distance  between 
two  vectors  is  defined  as 


/(x,  y)  = (x-y)'^(x-y)  (19) 

The  current  implementation  of  the  discriminant  functions  used  in  hsfsys  does  not  use  the  leading  term  in 
Equation  (18).  Further  research  is  needed  to  determine  if  the  natural  frequencies  of  character  occurrence  (in  the  Con- 
stitution box  for  example)  would  make  good  a priori  probability  estimates  and  improve  classification  accuracy.  If  the 
number  of  training  prototypes  within  each  class  are  approximately  balanced,  the  denominator  of  the  leading  term 
becomes  redundant.  Specht^  shows  that  the  discriminant  values  can  be  converted  to  estimate  a posteriori  probabilities 
by  dividing  each  discriminant  value  by  their  sum  such  that  they  add  up  to  1.0. 

Several  optimizations  have  been  added  by  NIST  to  the  traditional  PNN  implementation  in  order  to  decrease 
computational  intensity  and  improve  processing  times.  The  first  optimization  takes  advantage  of  pruning  those  proto- 
types that  do  not  significantly  contribute  to  the  computation  of  discriminant  values,  and  a second  optimization  utilized 
a search  tree  to  reduce  the  munber  of  prototypes  used  in  the  discriminant  value  summation.  Due  to  the  presence  of  the 
exponential  in  Equation  (18),  the  closer  a training  prototype  is  to  the  unknown  vector,  the  more  significant  the  proto- 
type’s contribution  to  its  discriminant  value.  In  light  of  this.  Equation  (18)  can  be  approximated  by  not  including  pro- 
totypes whose  exponential  term  contributes  less  than  10"^  times  the  largest  term.  Formally,  the  prototype  of  any 
given  class  can  be  deleted  if: 

exp  (y,  x ) ) < lO'^exp  (y,  x ) ) (20) 

where  the  subscript  c denotes  the  closest  training  prototype.  By  taking  logs  and  changing  sign,  the  condition  in  Equa- 
tion (20)  can  be  rearranged  without  the  need  for  computing  the  exponential  function.  The  resulting  test  in  the  distance 
domain  is 


(y,  x)  > 2>^a^lnl0  + (y,  x^)  (21) 

This  technique  can  be  used  to  approximate  the  traditional  PNN  in  Equation  (18).  The  associated  error  can  be 
constrained  by  setting  X to  a sufficiently  laige  positive  number.  This  parameter  should  not  be  less  than  log(P/L),  where 
P is  the  number  of  prototypes,  and  L is  the  number  of  classes.  The  value  used  in  h^sys  is  = 4 . This  ensures  that 
classification  results  will  not  change  between  the  optimized  and  traditional  PNN  implementations. 

Two  items  of  importance  make  using  Inequality  (21)  efficient.  First,  it  is  important  to  note  that  the  training 
prototype  with  smallest  distance  to  the  unknown  vector  (thus  contributing  the  maximum  exponential  term  to  its  dis- 
criminant value)  is  not  known  a priori.  The  determination  of  can  be  done  on  the  fly,  and  distances  to  each  prototype 
only  need  to  be  computed  once.  During  the  computation  of  the  distances,  a fist  of  eligible  prototypes  (prototypes  not 
yet  deleted)  can  be  maintained.  Eligible  prototypes  include  the  closest  to  the  unknown  vector  found  so  far  together 
with  all  other  training  prototypes  sufficiently  close  that  they  do  not  satisfy  the  deletion  criterion.  The  deletion  test  is 
conducted  by  substituting  in  Inequality  (21)  with  the  closest  prototype  found  so  far.  As  the  distances  between  the 
unknown  vector  y and  each  training  prototype  Xj  are  computed,  the  new  prototype  will  at  times  be  closer  to  y than  the 
closest  prototype  seen  to  that  point.  When  this  happens,  the  current  prototype  Xj  is  assigned  to  be  the  new  x^.  and  all 
eligible  prototypes  are  retested  using  Inequality  (21).  Using  this  single  pass  technique,  the  distance  (y,  x^)  can  only 
decrease  throughout  the  process,  so  prototypes  can  be  safely  deleted  along  the  way  and,  once  deleted,  they  can  never 
become  eligible  agam. 

The  second  item  related  to  the  efficiency  of  Inequality  (21)  takes  advantage  of  the  fact  that  the  distance  cal- 
culations can  be  preempted  once  they  become  sufficiently  large  to  trigger  the  deletion  criterion.  Inequality  (21)  implies 
complete  calculation  of 
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(22) 


k 

i=  1 

If  the  distance  summation  exceeds  the  deletion  criterion,  the  computation  of  Equation  (22)  can  stop  with  i<k,  as 
remaining  terms  contribute  nothing  more  to  the  outcome  of  the  test.  A useful  property  of  the  KL  transform  is  that  the 
features  are  ranked  in  order  of  decreasing  variance.  Therefore,  the  first  few  features  of  a training  prototype  contribute 
the  most  to  the  distance  summation.  Typically,  only  four  KL  coefficients  are  required  to  delete  a prototype,  and  only 
about  1 % of  the  prototypes  are  sufficiently  close  to  remain  eligible  for  use  m computing  discriminant  values.  Applying 
these  two  improvements  (on  the  fly  determination  of  and  preempting  distance  calculations)  makes  pruning  proto- 
types very  efficient,  which  in  turn  greatly  reduces  the  computation  of  discriminant  values.  The  first  optimization  step 
of  pruning  prototypes  achieves  a factor  of  4 speed  up  in  hsfsys. 

A further  optimization  has  been  integrated  into  the  PNN  classifier  provided  in  tins  distribution.  This  step  uti- 
lizes a search  tree  to  reduce  the  number  of  prototypes  used  in  the  PNN  calculations.  As  stated  earlier,  PNN  discriminant 
values  require  the  same  distance  calculations  as  those  used  in  nearest-neighbor  methods.  Nearest-neighbor  methods 
for  character  classification  have  been  shown  to  be  competitive  with  neural  network  methods.^^ 

In  nearest-neighbor  methods,  we  have  N characters  with  known  identities;  each  character  has  been  reduced 
to  a feature  vector  (or  point)  in  k dimensions.  In  practical  applications,  N is  large  (perhaps  10^  ot  so)  and  k is  typically 
in  the  range  24  to  64,  To  classify  an  unknown  character,  first  reduce  it  to  a feature  point  using  a technique  such  as  the 
KL  transform.  Then  calculate  the  distances  between  its  point  and  each  of  the  N known  points.  Any  ftuK^tion  of  the  N 
distances  and  the  N known  classes  can  be  used  to  classify  the  unknown  character.  For  example,  the  unknown  character 
could  be  assigned  the  class  of  the  nearest  of  the  A known  points.  PNN  itself  is  another  example  of  such  a function  and 
includes  some  optimal  properties.  Other  functions  are  described  m Reference  22, 

This  method  of  classification  is  expensive;  the  time  is  proportional  to  N,  smce  N distances  must  be  calculated. 
There  is  a large  literature  on  faster  methods.^  Among  the  best  are  the  k-d  tree  methods^’^^  of  Bentley,  which  often 
have  average  searching  time  proportional  to  log(A0-  For  our  case,  k is  large  and  N is  relatively  small  (N  is  much  smaller 
than  2^  and  the  training  points  are  sparse  in  l:-dimensional  space.  Therefore,  the  logarithmic  behavior  is  not  found. 
Some  slight  variations  on  the  k-d  tree  give  searching  time  proportional  to  sqrt(A0,  even  for  large  k.  While  not  as  good 
as  log(A0,  this  search  time  is  a substantial  improvement  over  time  proportional  toN.  A brief  description  of  this  search 
method  is  presented  here;  details  may  be  found  in  Reference  26. 

Construction  of  the  k-d  tree  is  done  recursively.  The  top  node  contains  all  N points.  The  two  children  of  this 
node  each  contain  N/2  points.  The  left  child  node  contains  those  points  whose  first  feature  component  has  values  less 
than  the  median  of  all  first  components,  Xi,  the  right  child  node  has  the  remainder.  Each  of  these  child  nodes  is  then 
divided  in  half  using  the  medians  of  the  second  components  of  the  points  m the  node,  and  so  on.  The  depth  of  the  tree 
is  log2(A0,  which  is  less  than  k for  our  applications.  Construction  of  the  tree  takes  time  proportional  to  Mog(A0,  but  it 
is  done  once  off-line  and  stored  in  a prototype  as  described  in  Section  9.2. 1 .2. 

Searching  for  m-nearest  neighbors  in  the  k-d  tree  achieves  speed  because  of  being  able  to  avoid  calculating 
distances  for  entire  sub-trees.  In  k-d  trees,  rather  than  searching  for  the  m closest  points,  it  is  more  natural  to  search  for 
points  within  distance  d of  the  unknown  point,  t,  as  follows.  Start  at  the  top  node,  and  let  the  unknown  point  have  com- 
ponents ti.  Suppose  < atj  . Put  the  left  child  node  on  a list  to  look  at  later.  All  the  points  in  the  right  child  node  are  at 
least  Xi-ti  distant  from  t.  If  Xyti  > d,  ignore  the  right  child  node;  otherwise  put  it  on  the  list  Continue  searching  by 
taking  one  node  at  a time  off  the  list  If  the  node  has  no  children,  look  at  the  distances  of  the  point  or  points  in  the  node 
and  remember  the  m smallest  distances.  If  the  node  has  children,  look  at  both  child  nodes  and  put  one  or  both  of  them 
on  the  list 

If  the  distance  d is  excessively  large,  too  few  sub-trees  will  be  discarded  and  too  many  distances  calculated, 
leading  to  a long  search  time.  If  d is  excessively  small,  too  many  sub-trees  will  be  discarded  and  too  few  nearest  neigh- 
bors will  be  found,  but  this  calculation  is  fast  A reasonable  approach  is  to  estimate  d,  preferably  on  the  small  side,  ff 
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not  enough  nearest  neighbors  are  found,  d is  increased  and  a new  search  is  made.  Also,  after  m distances  less  than  d 
have  been  calculated,  d can  be  reduced  to  the  largest  distance. 

To  estimate  the  distance  d,  we  use  the  centroids  of  each  class  of  known  points  as  trial  points  and  calculate  the 
distances  of  the  unknown  point  to  the  trial  points.  Then  we  use  a fraction,  usually  around  0.5,  of  the  smallest  such  dis- 
tance as  an  estimate  for  d.  In  hsfsys,  the  k-d  tree  is  traversed,  possibly  several  times,  using  increasing  factors  to  widen 
d.  These  factors  are  stored  in  global  arrays  beginning  with  the  name  “tree_cuts”.  A different  set  of  cut-off  values  are 
used  for  classifying  digits,  alphabetic  characters,  and  the  mixed  upper  and  lower  case  Constitution  box.  These  cut-ojff 
arrays  are  found  at  the  top  of  the  distribution  file  srd lib! hsf! field. c and  were  obtained  empirically  over  a testing  set  of 
KL  prototypes  (feature  vectors), 

KL  prototype  vectors  and  their  indexed  k-d  tree  are  calculated  off-liue  using  the  program  mislpat  discussed 
in  Appendix  B.  The  two  optimizations  discussed  in  this  section  (prototype  prunmg  and  k-d  tree  searching)  have  been 
integrated  into  an  optimized  PNN  procedure  treepnnhypsconsCO  found  m the  distribution  file  srcilibinnipnn.c.  This 
procedure  traverses  the  k-d  tree  producing  a relatively  small  yet  viable  set  of  prototypes.  This  small  set  of  prototypes 
is  then  used  to  calculate  approximated  PNN  discriminant  values  according  to  the  deletion  criterion  defined  in  Inequal- 
ity (2 1).  In  very  rare  cases,  no  close  prototypes  are  found  in  the  tree  search.  When  this  occurs,  all  the  training  prototypes 
are  used  in  the  approximated  PNN  calculation.  The  PNN  exponential  activations  are  normalized  to  estimated  proba- 
bilities by  dividing  by  their  sum  and  used  as  classification  confidence  values. 

The  optimized  version  of  PNN  described  in  this  section  runs  a factor  of  20  times  faster  than  the  traditional 
PNN  code,  and  tests  have  shown  that  the  gain  iu  speed  has  not  reduced  classification  accuracy.  The  optimizations  intro- 
duced by  NIST  now  enable  applications  to  capitalize  on  the  robusmess  of  the  PNN  algorithm  without  compromisiug 
processing  time. 

923  STORE  FIELD  RESULTS;  src/lib/fet/updatefetc;  updatefetQ 

The  results  of  character  classification  are  stored  in  Feature  (FET)  data  structures.  Upon  completion,  hsfsys 
writes  the  contents  of  two  of  these  structures  to  FET  files.  One  FET  structure  and  file  hold  the  system’s  hypothesized 
character  classifications  and  a second  FET  stmcture  and  file  hold  the  confidence  values  associated  with  each  character 
classification.  FET  files  are  editable  ASCII  files  similar  to  MFS  files  and  are  designed  to  contain  a list  of  names  and  a 
multi-column  set  of  data  values  that  are  associated  with  each  name.  Every  line  in  an  FET  file  contains  a name  followed 
by  zero  or  more  values.  The  names  of  each  entry  field  on  the  HSF  form  comprise  the  names  in  the  system’s  hypothesis 
and  confidence  files.  For  hypothesis  files,  the  values  that  follow  each  name  are  the  characters  recognized  by  the  system 
concatenated  together  without  space  separators  as  a single  field  value.  A line  in  the  file  that  has  no  value  after  the  field 
name  represents  a field  that  was  either  not  processed  or  was  recognized  to  be  empty.  There  is  a corresponding  confi- 
dence value  reported  in  the  system’s  confidence  file  for  every  character  classified  by  the  recognition  system  that  is 
reported  in  the  system’s  hypothesis  file.  The  confidence  values  for  the  characters  m a field  are  space-separated  on  each 
line  in  the  FET  file.  These  separators  may  be  a space  character  0x20  or  a tab  character  0x09,  Lines  are  terminated  with 
the  line  feed  character  OxOA.  The  library  srcilibifet  contains  a suite  of  routines  designed  to  read  and  write  FET  files 
and  manipulate  FET  structures. 


typedef  stmct  fetstruct{ 
int  alloc; 
intnum; 
char  **names; 
char  **values; 

} FET; 

Figure  25.  C definition  of  the  FET  stracture. 

Figure  25  lists  the  C definition  of  an  FET  structure  that  is  stored  in  includelfet.h.  The  structure  contains  four 
members.  Names  references  an  array  of  character  strings  corresponding  to  the  names  listed  in  the  first  column  of  an 
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FET  file.  In  the  case  of  hsfsys,  the  names  are  entry  field  identifiers.  Values  references  an  array  of  character  strings  hold- 
ing the  values  associated  with  each  entry  field.  For  a hypothesis  FET  structure  (one  diat  stores  the  system’s  hypothe- 
sized character  classification),  each  string  in  values  contains  the  characters  recogitized  by  the  system  for  that  specific 
field.  For  a confidence  FET  structure  (one  that  stores  confidence  values),  each  string  in  values  contains  the  space-sep- 
arated list  of  confidence  values  corresponding  to  the  field  value  stored  in  the  hypothesis  FET.  The  structure  member 
alloc  holds  the  number  of  allocated  positions  within  names  and  values,  and  num  holds  the  number  erf  contiguous  posi- 
tions currently  filled  in  names  and  values.  In  hsfsys,  the  primary  routine  responsible  for  manipulating  an  FET  structure 
is  updatefeti ) found  in  srdlibifetiupdatfet.c.  It  is  the  responsibility  of  an  application  to  parse  the  independent  confi- 
dence values  from  a string  stored  in  the  values  array.  The  FET  file  convention  provides  a common  I/O  interface  when 
manipulating  lists  of  ASdH  values  that  are  associated  with  a common  attribute  or  feature  (name).  The  contents  of  an 
FET  structure  or  file  can  be  integers,  floating  point  numbers,  names,  and/or  any  sequence  of  printable  ASCII  charac- 
ters. 


Hypothesis  File 

datalfl)000J4lf0000J4.nhy 

hsf_0 

hsf_l 

hsf_2 

hsf_3  0123456789 
hsf_4  0123456789 
hsf_5  0123456789 
hsf_6  86 
hsf_7  506 
hsf_8  8941 
hsf_9  95309 
hsf_10  891405 
hsf_ll  01 
hsf_12  707 
hsf_13  60170 
hsf_14  689547 
hsf_15  98 
hsf_166081 
hsf_17  77132 
hsf_18  314200 
hsf_19  78 
hsf_20464 
hsf_21  93849 
hsf_22  256369 
hsf_23  63 
hsf_24  224 
hsf_25  6902 
hsf_26  551339 
hsf_27  78 
hsf_28  722 
hsf_295798 
hrf_30  21313 

hsf_31  bavxujdyohsmzfcwgiakrezpln 
hsf_32  FSHUXIEZRQMLABGVIYPUCOJWH 
hsf_33  WE  THE  PEOPLE  THE  UNTIED 
STAIES  IN  FORM  A MORE  PERFECT  UNION 
ESTABUSH  JUSTICE  INSURE  DOMESTIC 
TRANQUILITY  PROVIDE  FOR  THE  COM- 
MON DEFENSE  OUR  THE  GENERAL  WEL- 
FARE AND  SECURE  THE  BLESSINGS  OF 
LIBERTY  TO  OURSELVES  OUR  POSTERITY 
DO  ORDAIN  ESTABLISH  THE  CONSTTITJ- 
TION  FOR  THE  UNTIED  STAIES  AMERICA 


Confidence  File 

datalf0000_14lf0000_14.nco 

hsf_0 

hsf_l 

hsf_2 

hsf_3  1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00  LOO 

hsf_4 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 

hsf_5  1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.000.99 1.00 

hsf_6  1.00  1.00 

hsf_7  1.00  1.00  1.00 

hsf_8  1.00  1.00  1.00  1.00 

hsf_9  1.00  1.00  1.00  1.00  0.82 

hsfJO  1.00  1.00 1.00  1.00 1.00  1.00 

hsf„ll  1.001.00 

hsf_12  0.88  1.00  1.00 

hsf_13  1.00  1.00  1.00  1.00  1.00 

hsf_14  1.00  1.00  1.00  1.00  1.00  1.00 

hsf_15  1.00  1.00 

hsf_16  1.00  1.00  1.00  1.00 

hsf_17  1.00  1.00  1.00  1.00  1.00 

hsf_18  1.00  1.00  1.00  1.00  1.00  1.00 

hsf_19  1.00  1.00 

hsf_20  1.00  1.00  1.00 

hsf_21  LOO  1.00  1.00  1.000.64 

hsf_22  LOO  1.00  LOO  1.00  1.00  1.00 

hsf_23  1.00  1.00 

hsf_24  1.00  1.00  1.00 

hsf_25  LOO  1.00 1.00  1.00 

hsf_26  LOO  LOO  1.00  1.00  1.00  LOO 

hsf_27  LOO  1.00 

hsf_28  LOO  1.00  1.00 

hsf_29  1.00  LOO  1.00  1.00 

hsf_30  LOO  1.00  1.00  1.00  1.00 

hsf_31  1.00  1.000.99  1.00  0.99  1.00  1.00  0.99  1.00 

LOO  1.00  0.85  0.90  LOO  0.91  1.00  0.50  0.68  0.68  0.98 

1.000. 95  1.000.66  0.57  0.99 

hsf_32  0.97  1.00  1.00  0.90  1.00  1.00  1.00  1.00  0.99 

1.00 1.00 1.00 1.000. 94  1.00 1.00 1.00  1.000.990.98 
1.001.001.001.00  0.70 

hsf„33 


Figure  26.  Example  of  a system  hypothesis  and  correspon^g  confi^nce  file. 
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There  are  10  completed  HSF  forms  provide  in  this  distribution  under  the  top-level  distribution  directory  data. 
Hypothesis  and  confidence  files  created  by  hsfsys  at  NIST  have  been  included.  Figure  26  fists  the  contents  of  the 
hypothesis  file  datalf0000_14lf0000_14.nhy  on  the  left,  and  on  the  right,  lists  the  corresponding  confidence  file  data! 
f0000_14lf0000_14.nco.  Notice  that  confidence  values  are  provided  for  every  field  processed  except  for  the  Constitu- 
tion field  (hsf_33),  which  had  dictionary-based  postprocessing  applied.  Also  note  that  any  line  continuations  and 
hyphenation  within  the  values  for  a single  field  have  been  inserted  by  this  document’s  text  formatter  and  do  not  actually 
exist  in  the  files. 

9,2.4  DEALLOCATE  FOR  FIELDS;  src/Ub/hsf/field.c;  free.fieldQ 

This  step  simply  deallocates  aU  the  memory  containing  data  dependent  on  the  type  of  field  being  processed. 
These  are  the  data  items  loaded  into  the  system  by  the  INITIALIZE  FOR  FIELDS  step.  The  memory  allocated  to  all 
the  basis  functions  and  intermediate  calculations  supporting  feature  extraction  and  all  the  prototypes  and  class  infor- 
mation needed  for  character  classification  are  deallocated. 

93  DO  LOWER  CASE  FIELD;  src/lib/hsf/field.c;  do_alpha_field0 

This  section  describes  how  hsfsys  processes  fields  containing  handprinted  lower  case  characters  such  as  field 
hsf_3 1 on  the  HSF  form.  Figure  27  lists  the  steps  used  to  process  lower  case  fields  and  cross-references  the  steps  to 
the  software  distribution  according  to  file  and  subroutine  name.  Notice  that  many  of  the  step  used  in  digit  field  pro- 
cessing are  applied  here  as  well.  Those  steps  reused  are  referenced  with  heading  numbers  pointing  to  the  previous  sec- 
tions and  will  not  be  discussed  in  this  section. 

The  difference  between  processing  digit  fields  and  lower  case  fields  are  in  what  feature  extraction  and  char- 
acter classification  files  are  loaded,  and  how  segmentation  is  conducted.  Lower  case  feature  extraction  requires  loading 
the  basis  function  file  weights! tdl 3 J.evt,  and  lower  case  classification  requires  loading  the  prototype  file  weights! 
tdlSJ.pat  and  the  median  vector  file  weights! tdl 3 J.med.  Unlike  processing  digits,  the  same  basis  function,  prototype, 
and  median  vector  files  are  loaded  when  processing  lower  case  characters  regardless  if  the  small  memory  mode  option 
"-m”  is  specified  or  not  The  differences  in  segmentation  are  discussed  below. 

93.1  PROCESS  ALPHABETIC  FIELD;  src/lib/hsf/field.c;  process_alpha_fieldO 

Lower  case  field  and  digit  field  processing  only  differ  slightly  in  how  characters  are  segmented.  Otherwise, 
there  is  nearly  no  difference  between  in  the  way  recognition  system  processes  lower  case  fields  and  digit  fields.  The 
handprint  entered  in  each  field  is  extracted  using  the  same  one-line  text  isolation  routine.  The  segmented  character 
images  are  normalized  in  terms  of  size  and  slant  using  the  same  techniques  applied  to  digit  images.  Features  are 
extracted  from  the  lower  case  character  images  using  the  same  techniques,  and  the  same  PNN  classifier  is  used  to  clas- 
sify the  feature  vectors.  Finally,  the  results  of  classification  are  added  to  the  same  hypothesis  and  confidence  FET  struc- 
tures that  hold  the  digit  field  results. 

9.3.1. 1 SEGMENT  ALPHABETIC  FIELD;  src/lib/hsf/segblob.c;  segbinblobO 

As  can  be  seen  from  Figure  27,  the  general  blob  extraction  routine  segbinblob( ) used  in  segmenting  digit 
fields  is  the  only  step  used  to  segment  lower  case  fields.  There  is  no  added  step  of  pasting  blobs  back  together  after  the 
lower  case  field  is  separated  into  its  connected  components.  This  leaves  the  system  vulnerable  to  the  issues  of  under 
and  over-segmenting  discussed  in  Section  9.2.2.2.2.  A lower  case  segmentor  that  attempts  to  subdivide  blobs  that  are 
too  laige  and  merge  blobs  that  are  too  small  would  likely  improve  the  system. 
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Figure  27.  Steps  to  process  the  lower  case  field. 
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3.2.3  STORE  FIELD  RESULTS  3.2.3  src/lib/fet/updatfet.c;  updatefetO 

3.2.4  DEALLOCATE  FOR  FIELDS  3.2.4  src/Iib/hsf/field.c;  free_field() 
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Figure  28.  Steps  to  proc^ess  the  upper  case  field. 
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Figure  29.  Steps  to  process  the  Constitution  field. 
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For  Each  Phrase  For  Each  Phrase 

3.5. 1.3.1  Spell-Correct  Line  Of  Text  3.5. 1.3.1  src/lib/dict/line.c;  spell_line2() 

3.2.3  STORE  FIELD  RESULTS  3.2.3  src/lib/fet/updatfet.c;  updatefetO 

3.2.4  DEALLOCATE  FOR  FIELDS  3.2.4  src/lib/hsf/field.c;  free_field() 


3.4  DO  UPPER  CASE  FIELD;  src/Ub/hsf/fielcl.c;  (lo_alpha_fieldO 


The  steps  used  to  process  the  upper  case  field  hsf_32  on  the  HSF  form  are  nearly  identical  to  those  used  to 
process  lower  case  fields.  The  only  difference  is  in  what  feature  extraction  and  character  classification  files  are  loaded. 
Upper  case  feature  extraction  requires  loading  the  basis  function  file  weightsltdlS _u.evt,  and  upper  case  classification 
requires  loading  the  prototype  file  weights/ tdl 3 _u.pat  and  the  median  vector  file  weights! tdl 3 ji.med.  When  process- 
ing upper  case  characters  the  small  memory  mode  option  “-m”  is  ignored.  The  heading  numbers  referenced  in  Figure 
28  point  back  to  steps  discussed  in  previous  sections. 

3.5  DO  CONSTITUTION  FIELD;  src/ib/hsf/field.c;  do_const_fieldO 

The  steps  required  to  process  the  Constitution  field  (hsf_33)  are  listed  in  Figure  29.  Again,  there  are  a number 
of  steps  that  are  used  here  that  have  already  been  discussed  in  previous  sections.  Two  things  make  processing  the  Con- 
stitution field  different  from  the  digit,  lower  case,  and  upper  case  fields.  They  are  the  processing  of  multiple  lines  of 
text  within  the  same  field  and  the  optional  dictionary-based  postprocessing.  All  other  steps  apply  the  same  techniques. 
Feature  extraction  requires  loading  the  basis  function  file  weights! ul.evt,  and  character  classification  requires  loading 
the  prototype  file  weights! tdl 3 _ul. pat  and  the  median  file  weights! tdl 3 _ul.med.  If  the  small  memory  mode  option  is 
used  to  invoke  hsfsys,  a smaller  set  of  prototypes  and  their  associated  files  are  loaded  instead.  These  files  begin  with 
the  root  file  name  td3_ul_s  in  the  top-level  directory  weights.  The  basis  functions  and  prototype  files  used  to  process 
the  Constitution  field  have  been  designed  to  assign  both  lower  and  upper  case  instances  of  the  same  character  with  a 
single  upper  case  classification.  For  example,  an  H and  an  h are  both  classified  as  H.  This  was  done  because  people 
frequently  switch  between  lower  and  upper  case  when  handprinting  textual  information. 


3.5.1  PROCESS  CONSTITUTION  FIELD;  src/lib/hsf/field.c;  process_const_field() 

Processing  the  Constitution  field  is  different  from  the  previous  types  of  fields  because  it  involves  handling 
multiple  lines  of  text  within  the  same  field,  and  contextual  pos4)rocessmg  at  the  word-level  is  possible.  Even  with  these 
differences,  there  is  still  a large  overlap  with  the  steps  already  discussed.  The  character  segmentor  simply  extracts 
blobs  using  the  connected  component  utility.  The  segmented  character  images  are  normalized  in  terms  of  size  and  slant 
using  the  same  techniques  discussed  earher.  Features  are  extracted  from  the  segmented  character  images  using  the  KL 
transform,  the  same  optimized  PNN  classifier  is  used  to  recognize  the  feature  vectors,  and  the  results  of  classification 
are  added  to  the  same  hypothesis  and  confidence  FET  structures  that  hold  the  previous  field  results.  Confidence  values 
are  reported  for  the  raw  OCR  results,  but  no  confidence  values  are  reported  when  dictionary-based  postprocessing  is 
performed. 

3. 5. 1.1  ISOLATE  MULTIPLE  LINE  FIELD;  srcAib/hsfyisolate.c;  iso_nline_fieldO 

Three  of  the  six  registration  marks  used  to  register  the  image  lie  on  comers  of  the  Constitution  box  at  the  bot- 
tom of  the  HSF  form.  Because  of  this,  the  form  removal  is  quite  accurate  at  removing  the  black  pixels  comprising  the 
Constitution  box  from  the  input  image.  The  spatial  field  template  stored  in  tmpWhsfimpltpts  is  used  to  extract  the  field 
subimage.  The  handprinted  data  within  the  field  is  then  isolated  using  spatial  histogram  techniques  like  the  ones  used 
to  locate  the  left  and  right  ends  of  the  handprinted  text  in  a one-line  text  field.  Left,  right,  top,  and  bottom  edges  are 
found  by  searching  horizontal  and  vertical  histogram  bins  directly.  Searching  inward  from  the  beginning  or  end  of  the 
histogram  bins,  the  first  bin  greater  than  10  pixels  is  located,  and  then  a reverse  search  from  that  point  locates  the  first 
bin  that  equals  zero.  The  use  of  a 10  pixel  threshold  avoids  speckle  noise  in  the  field  and  locates  the  beginning  of  sig- 
nificant character  data,  while  the  reverse  search  locates  the  edge  of  the  character  data.  H^sys  extracts  the  subimage 
bounded  by  these  left,  right,  top,  and  bottom  edges. 

3.5. 1.2  BUILD  PHRASE  LISTS;  src/lib/phrase/bld_pis.c;  build_pi_listsO 

The  connected  component  utility  used  to  segment  the  isolated  Constitution  field  returns  blobs  in  column- 
major  order.  This  section  describes  the  steps  used  to  sort  the  blobs  into  correct  reading  order.  Initially  it  was  anticipated 
that  a simple  sort  of  the  x and  y center  coordinates  of  each  blob  would  be  sufficient  to  organize  the  blobs  into  reading 
order  (left-to-right  and  top-to-bottom).  Unfortunately,  it  was  found  that  the  handprint  in  the  Constitution  field  often 
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fluctuates  significantly  within  lines  as  well  as  across  lines,  and  this  fluctuation  is  exaggerated  by  the  use  of  punctuation 
marks,  causing  techniques  that  use  global  line  statistics  to  fad. 

A localized  point-to-point  technique  was  developed  to  organize  the  segmented  blobs  into  phrases  (text  lines). 
The  method  is  divided  into  three  steps.  First,  the  extracted  blobs  are  collected  into  segments  of  text  fines.  Second,  the 
phrase  segments  are  merged  into  complete  lines  of  text  Finally,  the  text  lines  are  sorted  top-to-bottom  so  that  the  order 
of  the  blobs  within  the  lines  correctiy  reconstruct  the  sequence  of  characters  in  the  text  paragraph. 

9.5. 1.2.1  Find  Phrases;  src/lib/phrase/find„pis.c;  find_pLlists() 

The  connected  component  utility  produces  segmented  character  images.  Each  segmented  image  has  assigned 
to  it  the  position  where  it  was  extracted  from  the  isolated  field  image.  The  location  of  each  blob  is  identified  by  com- 
puting the  geometric  center  of  the  smallest  rectangle  bounding  the  blob.  Adding  each  blob’s  center  to  the  location  from 
where  the  blob  was  extracted,  produces  a 2-dimensional  grid  of  blob  centers  that  can  be  used  to  reconstruct  the  line 
trajectories  of  the  handprinted  text. 

The  process  of  organizing  the  blob  centers  into  text  lines  is  referred  to  as  Adaptive  Sequence  Reconstruction. 
This  technique  searches  the  2-dimensional  grid  of  blob  centers  taldng  into  account  local  writing  fluctuaticais  to  sort  the 
blobs  into  correct  reading  order.  A point-to-point  search  is  conducted  based  on  a local  search  space  defined  by  the  func- 
tion: 


S = <2cos(^?0) 

This  function,  which  is  similar  to  an  antenna  sensitivity  model,  forms  a tear-drop  shaped  bubble  that  is  desirable  for 
this  application  because  it  is  horizontally  biased.  The  intericff  of  the  function  is  used  as  a locally  constrained  search 
space.  Throu^  empirical  study  a techruque  for  controlling  the  shape  of  the  S function  was  developed.  At  values  of  b 
near  0. 1 , the  function’s  shape  is  circular,  and  as  b increases  the  shape  continuously  fonns  into  a tear-drop.  The  variable, 
a,  controls  the  length  of  the  bubble  along  its  horizontal  axis  of  symmetry.  By  increasing  a,  the  length  of  the  bubble  is 
increased  and  the  search  is  extended  in  the  horizontal  direction. 

A linear  control  function,  b=L(a),  is  used  to  modify  the  shape  of  the  bubble,  b,  as  the  length  of  the  bubble, 
<2,  is  increased.  This  function  is  defined  by  the  slope  of  the  line  connecting  two  empirically  derived  points.  One  point 
used  to  define  the  control  fine,  (al,  hi),  is  calculated: 

al  = /ix  0.375  hi  = 0.1  (21) 

where  h is  the  average  blob  height  for  the  writer.  If  for  example  the  writer’s  average  blob  height  is  32  pixels,  this  point 
on  the  linear  control  function  defines  a circular  bubble  with  a radius  of  12  pixels  (1mm).  The  second  point  used  to 
define  the  control  line,  {al,  bl),  is  calculated: 

al  = hx  4.7  bl  = 2.0  (22) 

If  the  writer’s  average  blob  height  is  32  pixels,  this  second  point  defines  a tear-shaped  bubble  with  a horizontal  length 
of  150  pixels.  If  a writer’s  handprint  is  small,  the  bubbles  used  in  the  search  are  adapted  to  be  smaller,  and  if  a writer’s 
handprint  is  large,  the  bubbles  used  in  the  search  are  adapted  to  be  larger.  In  addition,  the  bubble  defining  the  search 
space  continuously  changes  from  circular  to  tear-drop  in  shape  as  the  extent  of  the  search  inareases. 

Using  the  linear  control  function  L,  the  size  and  shape  (rf  the  bubble  can  be  continuously  modified  as  shown 
in  Figure  30.  In  these  three  examples,  average  blob  heights  of  16, 32,  and  48  are  used,  respectively.  If  a search  is  to  be 
conducted  relative  to  the  right  of  a blob’s  center,  then  only  the  portion  of  the  function  with  x>0  is  used.  If  a search  is 
to  be  conducted  relative  to  the  left  of  a blob’s  center  then  the  portion  of  the  function  with  x<0  is  used.  The  search  is 
conducted  by  initializing  a to  a starting  length  and  then  testing  to  see  if  any  other  blob  centers  are  located  within  the 
boimdary  of  the  bubble.  If  points  are  found,  then  the  nearest  blob  is  selected.  Otherwise,  a is  incremented  and  the  bub- 
ble is  enlarged  and  lengthened  and  a test  for  blobs  in  the  new  bubble  is  conducted.  Ibis  continues  until  a center  point 
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of  a neighboring  blob  is  found,  or  a exceeds  some  threshold.  In  Figure  30,  successive  bubbles  are  overlaid  from  a com- 
mon center  point  where  a is  initialized  to  12,  incremented  by  20,  and  terminated  at  300. 


Figure  30.  Adaptation  of  bubbles  to  the  writer’s  average  blob  height. 

The  search  begins  from  the  blob  center  closest  to  the  top-left  comer  of  the  field.  The  blob  is  added  to  an  empty- 
list,  and  a bubble  is  initialized,  tested,  and  then  grown  via  incrementing  a until  either  a neighboring  blob  center  is  foimd 
or  a exceeds  a threshold.  The  threshold  used  in  this  system  is  300  which  is  equal  to  1 inch  (12  pixels  per  millimeter 
equals  300  pixels  per  inch).  In  general,  the  new  blob  is  added  to  the  list  and  the  search  resumes  from  the  center  of  the 
new  blob.  However,  if  the  new  blob’s  center  does  not  meet  given  criteria,  then  its  center  is  added  to  the  list,  but  the 
search  continues  from  the  current  blob  center  and  does  not  advance  to  the  center  of  the  new  blob.  This  way  the  system 
does  not  naively  foUow  erratic  line  trajectories,  mmimiTing  the  chances  of  crossing  over  into  adjacent  lines,  such  as 
may  happen  when  a comma  is  found. 


Figure  31.  Heuristic  used  to  control  the  advancement  of  the  search. 

Figure  3 1 illustrates  this  heuristic  as  it  is  used  to  control  the  advancement  of  the  search.  In  order  to  advance, 
the  new  blob  center  must  be  within  the  area  defined  by  the  union  of  two  region.  The  first  region  is  the  area  bounded  by 
two  lines  with  slope  -0.25  and  -M3.25  projecting  from  the  current  blob  center.  The  second  region  is  the  area  bounded 
by  two  horizontal  lines  with  y-intercepts  at  -(0.25  */2)  and  (0.25  */2).  centered  about  the  last  blob  added  to  the  hst.  The 
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tcp  Hiaoram  in  Figure  3 1 show  bubbles  projected  from  the  current  blob  center  ( 1).  The  closet  neighboring  blob  center 
is  (2),  however  (2)  is  not  within  the  given  criteria  represented  by  the  region  filled  with  gray.  Therefore,  (2)  is  added  to 
the  list  with  (1),  but  the  current  bubble  position  does  not  advance  to  (2).  The  middle  diagram  in  Figure  31  shows  the 
search  continuing  with  bubbles  being  projected  from  (1),  The  next  closest  blob  center  is  (3).  Notice  that  the  gray  region 
has  changed  from  the  top  diagram.  The  triangular  slope-based  region  remains  anchwed  to  (1),  but  the  horizontal 
region,  based  on  the  writer’s  average  blob  height,  is  now  defined  in  relationship  to  (2).  Blob  center  (3)  is  added  to  the 
list  with  (1)  and  (2),  and  because  (3)  is  within  the  new  gray  region,  the  current  blob  center  advances  to  (3)  as  shown 
in  the  bottom  diagram  in  Figure  3 1 . Note  that  once  a blob  center  is  added  to  the  list,  it  is  not  considered  again  in  the 
search  process. 

It  was  observed  during  the  development  of  this  approach  that  the  heuristic  described  above,  when  tuned  to 
handle  isolated  cases,  did  not  yield  proper  results  in  other  cases.  It  was  determined  that  as  local  fluctuations  in  the  hand- 
print become  excessive,  rather  than  force  the  system  to  make  a guess,  the  point-to-point  search  should  be  preempted. 
The  search  is  restarted  from  a blob  not  yet  included  in  any  lists  and  closest  to  the  top-left  of  the  image.  This  action  is 
also  taken  at  the  end  of  a text  line  when  no  new  neighboring  blob  centers  are  found  to  the  right  of  the  current  blob. 
Each  restart  involves  starting  a new  fist,  and  the  entire  search  process  is  terminated  when  every  blob  in  the  image  has 
been  assigned  to  a list. 

The  oiterion  for  preempting  the  search  and  beginning  a new  list  is  illustrated  in  Figure  32.  In  this  illustration, 
the  search  is  curr^tly  being  conducted  from  blob  center  (2),  and  (3)  has  been  located  as  the  next  nearest  neighbor.  In 
the  previous  step,  (2)  was  found  by  searching  from  (1),  and  both  (1)  and  (2)  have  been  added  to  the  current  list  The 
distance,  cfi,  is  the  vertical  distance  between  (1)  and  (2),  and  the  distance,  d2,  is  the  vertical  distance  between  (2)  and 
(3).  The  two  parallel  horizontal  lines  in  the  diagram  represent  the  area  bounded  by  -h  and  +h  centered  about  the  pre- 
vious blob  center  (1).  The  sum  of  (dl+d2),  the  vertical  distance  between  (1)  and  (3),  exceeds  the  limit,  /,  representing 
the  region  bounded  by  the  horizontal  lines;  therefore  the  search  is  preempted  and  (3)  is  not  added  to  the  current  list. 
Blob  (3)  is  left  unassigned  so  that  it  can  be  added  to  a fist  later  in  the  ^-^arch  process. 


Figure  32.  Heuristic  used  to  preempt  the  search. 

It  is  surprising  how  well  this  preemptive  heuristic  works.  At  times,  more  frequently  with  some  writers  than 
wifii  others,  the  local  writing  fluctuations  become  excessive  and  the  search  is  restarted.  Often  the  restart  resumes  on 
the  next  line  and  the  point-to-point  search  is  successfirl  in  hacking  the  next  line.  The  search  is  top-down  by  nature,  so 
that  the  blobs  in  the  line  above  the  area  of  excessive  fluctuation  are  likely  to  be  assigned  to  a previous  fist.  Eventually 
the  left-most  blob  involving  the  fluctuation  is  the  closest  remaining  blob  to  the  top-left  of  the  field,  and  the  point-to- 
point  search  resumes  from  that  blob.  All  neighboring  blobs  from  the  line  above  and  the  line  below  have  been  previ- 
ously assigned  leaving  only  the  blobs  comprising  the  fluctuation  exposed.  This  greailv  reduces  the  system’s  guess- 
work and  thereby  reduces  system  errors.  Figure  33  shows  the  results  of  segmenting  the  Constitution  field  in  Figure  1 
and  using  the  bubble  technique  ♦ ■'  sort  the  blobs  into  lines.  A bubble  is  traced  from  each  point  where  a neighboring 
blob  was  found,  and  each  bubbit  reflects  the  actual  size  and  the  shape  of  the  search  space  used  to  locate  the  neighbor. 
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Figure  33.  Traces  of  the  bubbles  used  to  sort  the  blobs  into  lists. 


9.5. 1.2.2  Merge  Phrases;  src/lib/phrase/meig_pis.c;  meige_pi_listsO 

The  point-to-point  search  produces  multiple  lists  of  blobs.  Some  of  the  lists  represent  complete  lines  of  text, 
and  other  lists  may  represent  only  fragments  of  the  text  lines  printed.  A final  merging  process  is  required  so  that,  upon 
completion,  only  fists  containing  complete  text  fines  remain.  Two  heuristics  are  used  for  merging  the  blob  lists;  they 
are  illustrated  in  Figure  34  and  Figure  35.  The  fists  are  sorted  in  descending  order  according  to  the  number  of  blobs  in 
each  fist.  The  longest  fist  is  first  compared  against  all  other  fists,  applying  the  first  heuristic  and  then  the  second  to  each 
comparisoiL  If  two  fists  meet  the  merging  criteria,  they  are  merged  and  the  looping  process  is  restarted  by  resorting  the 
lists.  Otherwise,  the  next  longest  fist  is  compared  to  the  remaining  shorter  fists,  and  so  on  until  all  the  fists  are  looped 
through  and  no  merging  takes  place.  When  two  fists  are  merged,  their  blob  centers  are  appended  into  one  larger  fist 
and  then  sorted  on  their  x-coordinates.  Figure  34  illustrates  the  merging  of  two  when  the  end  poiut  of  the  shorter  fist 
is  within  a vertical  distance  of  -(0.75*/z)  and  +(0J5*h)  of  the  longer  fists ’s  corresponding  end  point. 


Before 


Figure  34.  Heuristic  for  merging  blob  fists  based  on  end  point  positions. 


Before 


Figure  35.  Heuristic  for  merging  blob  fists  based  on  fine  trajectories. 

The  second  merge  heuristic  is  illustrated  in  Figure  35.  In  this  case,  the  blob  centers  comprising  the  longer  fist 
are  fitted  using  linear  least  squares  to  produce  a slope  and  y-intercept.  A perpendicular  distance  is  computed  between 
each  blob  center  in  the  shorter  fist  and  the  fine  fitted  to  the  longer  fist,  and  the  distances  less  than  (0.5  */2)  are  counted. 
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This  area  along  the  fitted  line  is  represented  by  the  gray  region  in  the  diagram.  The  two  lists  are  merged  if  the  count  is 
greater  than  (0. 1 */z),  where  n is  the  number  of  blobs  in  the  Icmger  list.  This  facilitates  merging  of  lists  that  lie  along  the 
same  line  trajectory,  but  whose  end  points  are  somewhat  erratic. 

9.5. 1.2.3  Sort  Phrases  Top  To  Bottom;  src/lib/phrase/sort_pis.c;  sort_pi_lists_on_yO 

As  a result  of  the  previous  two  steps,  the  blobs  segmented  from  the  Constitution  field  are  now  gathered  and 
sorted  into  lines.  The  last  step  is  to  sort  the  resulting  lines  vertically.  The  lines  are  sorted  based  on  the  y-coordinate 
values  of  the  first  blob  in  each  line.  When  finished,  the  correct  reading  order  has  been  reconstructed. 

To  conduct  the  various  sorts  in  hsfsys,  a multiple-indexed  recursive  quick  sort  utility  has  been  provided  with 
this  software  distribution.  The  utility  is  multisort()  and  is  found  in  srd lib! util! multsort.c.  The  utility  is  capable  of  sort- 
ing on  up  to  5 integer  keys  (primary,  secondary,  etc.),  and  the  values  sorted  can  be  an  array  of  integers  or  an  array  of 
pointers.  Each  of  the  5 keys  can  be  sorted  independently  increasing  or  decreasing.  A macro-based  interface  has  been 
developed  to  help  the  caller  access  the  flexible  capabilities  of  this  utility.  The  macro  definitions  are  stored  in  includel 
multsort.h  and  examples  of  how  they  are  used  can  be  seen  in  srd  lib!  util!  sortindx.c. 

9.5. 1.3  CORRECT  AND  IDENTITY  WORDS;  src/lib4)hrase/spellphr.c;  spell_phrases2() 

No  contextual  information  has  been  use  up  to  this  point  by  the  recognition  system  other  than  knowing  the  type 
of  each  field  (digit,  lower  case,  upper  case,  or  Constitution).  Of  these  field  types,  only  the  Constitution  box  has  data 
that  can  be  processed  using  language  or  word  models.  A dictionary-based  postprocessing  capability  has  been  inte- 
grated into  the  system,  and  it  can  be  optionally  selected  from  the  command  line  as  described  in  Section  8.4. 

If  dictionary-based  posq)rocessing  is  not  selected,  the  system  stores  the  raw  hypothesized  character  classifi- 
cations and  their  corresponding  confidence  values  to  the  output  FET  structures,  which  upon  completion,  are  written  to 
hypothesis  and  confidence  files.  If  dictionary-based  postprocessing  is  selected  then  the  raw  character  classifications 
are  further  processed.  The  system’s  hypothesized  classifications  are  prone  to  errors.  These  errors  are  introduced  when 
the  connected  components  over  and  under-segment  the  field  image  and  when  the  classifier  assigns  an  incorrect  class 
to  a properly  segmented  character  image.  By  using  a dictionary,  many  of  these  errors  can  be  corrected. 

9.5. 1.3.1  SpeU-Correct  Line  Of  Text;  src/Ub/dict/line.c;  spell_line20 

The  dictionary-based  postprocessing  described  in  this  section  is  referred  to  as  Word  Identification  using 
Fanout  Signals.  Up  to  this  point  in  the  system,  the  handprinted  text  within  the  Constitution  box  has  been  isolated,  the 
exttacted  field  image  has  been  segmented  into  blobs,  and  the  resulting  blob  images  have  been  sorted  mto  correct  read- 
ing order.  Each  blob  image  has  also  been  size  and  slant-normalized,  having  its  features  extracted  and  classified.  These 
steps  were  applied  to  the  image  in  Figure  33  producing  the  text  shown  in  Figure  36.  Notice  that  the  character  classifi- 
cations are  all  upper  case  due  to  the  merged  upper  and  lower  case  classes  in  the  prototype  file  weightslul.kl;  notice  the 
large  number  of  errors  contained  in  classifier  output;  and  also  notice  there  are  no  inter- word  spaces  recognized  at  this 
point  in  the  process.  The  dictionary-based  postprocessing  has  been  developed  to  correct  these  classification  errors  and 
to  detect  word  boundaries  within  text  lines  like  the  ones  shown  in  Figure  36. 

Line  1:  ILAUTHEPEOPLEOFTIIEUHLTEASTCIESLNORDERTOFORMAMORE 

Line  2:  PTRFECTUHOHLOTABLLSHJUSTICELLNSUREDOMESTLC 

Line  3;  TRSNQUILHTYJPROIDEFURTHECOMAONDEFMSEPROLNURERHE 

Line  4:  GENIRALWWMRCANDSEEURTTHEBnSSLNGSOFLLBEirrYTO 

Line  5:  OURSEIVOSANDOURPOSTLRITYDClORDALNCNDNTABnSH 

Line  6:  MIJUNSTTIUTIONFORGEUNLTEDSRARESMFAMERDCA 

Figure  36.  Example  of  classifier  output  prior  to  contexmal  processing. 

The  Preamble  to  the  U.S.  Constitution  is  ccmiprised  of  38  unique  words,  and  these  words  are  used  to  construct 
the  dictionary  (lexicon)  shown  in  Figure  37.  The  lexicon  is  used  to  detect  words  within  text  lines,  identifying  word 
boundaries  and  correcting  any  segmentation  and  classification  errors  existing  within  the  text  fines. 
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Figure  37.  Lexicon  constructed  from  the  text  contained  in  the  Preamble  to  the  U.  S.  Constitution. 


The  technique  is  illustrated  by  the  example  shown  in  Figure  38.  In  this  example,  a portion  of  the  first  line  of 
text  in  Figure  36,  “STCTESLNORDE”,  is  being  processed.  The  graph  plots  the  floating  point  numbers  listed  in  the 
first  column.  These  numbers  form  a signal  which  is  processed  in  order  to  locate  words  within  the  text.  The  generation 
of  these  signals  will  be  discussed  later.  The  second  column  is  a fan-out  of  hypothesized  words  beginning  with  the  char- 
acter S and  adding  one  successive  character  from  the  text  line  forming  a new  hypothesized  word  on  each  row  down 
the  column.  The  maximum  length  of  a hypothesized  word  is  12  characters,  which  is  the  length  of  the  longest  word  in 
the  lexicon,  “CONSTITUTION”.  The  third  column  lists  the  best  match  from  the  lexicon  for  each  hypothesized  word 
in  the  second  column.  The  fourth  column  lists  alignments  that  are  produced  using  the  Levenstein  Distance  to  match 
the  hypothesized  word  to  the  lexicon  match.  In  the  ahgnments,  0 represents  a correct  character,  1 represents  a substi- 
tuted character,  2 represents  an  inserted  character,  and  3 represents  a deleted  character.  These  ahgnments  are  used  to 
generate  the  signals  listed  in  the  first  column  and  plotted  in  the  graph. 
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Figure  38.  Signals  generated  from  a fan-out  of  hypothesized  words. 

A signal  value,  s,  is  computed  from  two  terms,  e and  t.  The  first  term,  e,  is  an  error  term  and  is  computed: 

^ = J^g  (23) 

where  n is  the  number  of  errors  ( 1 ’s,  2’s,  and  3 ’s)  in  a hypothesized  word’s  ahgnment,  I is  the  total  number  of  characters 
in  the  ahgnment,  and  g is  the  number  of  contiguous  groupings  of  1 ’s  and  3’s.  The  Levenstein  Distance  strictly  mini- 
mizes the  amount  of  error  in  the  ahgnment  without  regard  for  the  resulting  configuration  of  ahgnment  elements.  The 
variable  g is  used  to  favor  hypothesized  words  whose  ahgnments  contain  contiguous  groupings  of  correct  characters 
(O’s)  over  ahgnments  containing  many  discontinuities. 

The  second  term  used  to  compute  the  signal  is  t.  This  is  a translation  term  based  on  the  linear  function,  T,  that 
biases  longer  hypothesized  words  over  shorter  ones.  In  this  way,  matches  to  the  word  “DOMESTIC”  are  favored  over 
matches  to  the  word  “DO”,  and  “INSURE”  is  favored  over  the  word  “IN”.  The  linear  translation  function  used  in  this 
study  is  defined  by  the  empirically  derived  points  (2,  0.5)  and  (12, 0.4);  such  that  r=0.5  for  hypothesized  words  of 
length  2,  and  t=0A  for  hypothesized  words  of  length  12.  The  translation  term  is  determined  by  locating  the  point  on 
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the  line  at  the  position  corresponding  to  the  length  of  the  hypothesized  word’s  dictionary  match.  If  p is  the  lengdi  of 
the  dictionary  match,  then  t=T(p).  Signal  value,  s,  is  then  computed: 

s = 1.0-e-r  (24) 

The  signals  listed  in  the  first  column  erf  Figure  38  are  searched  top  to  bottom.  Only  those  hypothesized  words 
with  5'>0  are  considered  to  contain  possible  words.  All  other  hypothesized  words  in  the  fan-out  are  ignored.  The 
hypothesized  word  with  the  laigest  signal  strength  is  selected.  If  this  word  is  a substring  of  a hypothesized  word  further 
down  the  list,  such  as  “DO”  in  “DOMAIN”,  and  the  word  containing  the  substring  has  a signal  strength,  5>0,  then  the 
longer  word  is  selected  in  place  of  the  word  with  maximum  signal. 

Once  a hypothesized  word  is  selected  from  the  fan-out,  the  lexicon  match  for  that  word  is  pushed  onto  a stack 
and  the  alignment  is  used  to  synchronize  the  processing.  As  can  be  seen  m Figme  38,  characters  that  match  between 
the  hypothesized  word  and  the  lexicon  match  are  represented  by  O’s.  The  alignment  elements  between  the  left-most  0 
and  the  right-most  0 comprise  an  alignment  span,  and  it  corresponds  to  the  characters  in  the  lexicon  match.  The  ends 
of  this  alignment  span  demarcate  the  boundaries  of  the  word  within  the  original  line  of  text.  Processing  the  signals  m 
this  fashion  is  done  recursively.  If  a portion  of  the  fan-out  remains  to  the  left  of  the  selected  word’s  alignment  span, 
then  the  remaining  piece  of  fan-out  may  contain  another  word.  Remember  the  maximum  hypothesized  word  is  12  char- 
acters which  is  long  enough  to  hold  3 or  4 small  words  from  the  lexicon  simultaneously.  The  remaining  left  portion  is 
processed  recursively,  recalculating  new  signal  values  and  searching  for  words  within  that  piece.  As  lejdcon  matches 
are  selected,  they  are  pushed  onto  the  stackThe  recursion  continues  until  aU  of  the  fan-out  to  the  left  of  the  top-level 
selected  word  have  been  exhaustively  processed.  The  selected  lexicon  matches  are  then  popped  off  the  stack  in  correct 
reading  order,  and  a new  fan-out  is  rebuilt  beginning  with  the  first  character  to  right  of  the  top-level  selected  word’s 
alignment  span.  For  example  in  Figure  38,  the  hypothesized  word,  “STCTES”  is  selected  with  a maximum  signal  of 
0.254.  The  next  fan-out  will  begin  with  L,  starting  from  the  position  in  the  text  line,  “LNORD^TOFOR”.  If  no 
hypothesized  words  are  selected  within  the  current  fan-out,  then  the  processing  advances  one  character  in  the  text  fine, 
and  the  fan-out  begnis  from  that  point. 

Through  this  approach,  segmentation  and  classification  errors  are  corrected,  and  word  boundaries  are  auto- 
matically identified.  The  results  of  the  dictionary-based  postprocessing  are  stored  to  the  hypothesis  FET  structure.  All 
the  words  recognized  by  processing  fanout  signals  are  concatenated  together  into  a single  string  separated  by  spaces 
and  stOTed  as  hsf_33  field’s  value  in  the  hypothesis  FET.  No  confidence  values  are  stored  in  the  confidence  FET  struc- 
ture. An  example  of  the  results  of  dictionary-based  postprocessing  can  be  seen  in  Figure  40,  These  results  were 
obtained  from  the  raw  classifications  shown  in  Figure  39. 


WETHEPEOPEOPTHRUNIEEDSTATESIINOTrORMAMOREIPEHECZUNONIESEEBLIIHJUS 

TI<:EnNJlJREDOMESnCIRANGUn(3>RO\aHFORTHE(rrMMONDETENEIPKOmCETHEY 

TENERALWEUTJEZNDSEWRETHCBKSSINDJJFLLBERTrOJOVRSELVUIDOURPOSTERI 

YRIDOORJAD^MDESTZBLISLTHOCONSTrrUTIONFORTHTUNZEDSTNTESOTANEVlICA 

Figure  39.  The  raw  classifications  from  running  h^sys  on  the  form  image  data! jD000_14ip000_14. pet. 


WE  THE  PEOPLE  THE  UNTIED  STATES  IN  FORM  A MORE  PERFECT  UNION  ESTABLISH  JUS- 
TICE INSURE  DOMESTIC  TRANQUTLTTY  PROVIDE  FOR  THE  COMMON  DEFENSE  OUR  THE 
GENERAL  WELFARE  AND  SECURE  THE  BLESSINGS  OF  LIBERTY  TO  OURSELVES  OUR  POS- 
TERITY DO  ORDAIN  ESTABLISH  THE  CONSTITUTION  FOR  THE  UNTIED  STATES  AMERICA 

Figure  40.  The  results  of  dictionary-based  posti)rocessing  on  the  raw  classifications  shown  in  Figure  39. 
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10.  PERFORMANCE  AND  TIMING  STATISTICS 


Running  h^sys  produces  a hypothesis  file  and  a confidence  file.  The  hypothesis  file  contains  the  characters 
recognized  in  each  field  on  the  input  HSF  form,  and  the  confidence  file  contains  the  confidence  values  produced  for 
each  character  classification  reported  in  the  hypothesis  file.  Both  of  these  are  FET  files  and  are  compatible  as  input  files 
to  the  NIST  Scoring  Package.“^'^^ 

In  a sample  of  the  first  500  writers  in  SDl,  the  system  achieves  a character  output  accuracy  of  92.9%  (59308/ 
63830)  on  digit  fields  with  no  character  rejections,  where  character  output  accuracy  CHAR8^  ^ is  defined  to  be: 


CHARS  = 


AC 


chrrec 

char 


total  ^^fchr 


(25) 


Running  the  recognition  system  using  the  small  memory  mode  option,  hsfsys  achieves  a character  output  accuracy  of 
92.3%  (58916/63830)  with  one  third  the  training  prototypes  used  by  default.  The  numerator  represents  all  the  seg- 
mented character  images  correctly  classified  by  the  recognition  system  that  are  not  rejected.  The  denominator  repre- 
sents the  total  munber  of  characters  that  can  possibly  be  recognized  on  the  completed  forms.  The  system  achieves  a 
character  output  accuracy  of  75.3%  (9611/12766)  on  lower  case  fields  and  84.5%  (10784/12766)  on  upper  case  fields 
without  the  use  of  context-based  postprocessing. 

Hsfsys  achieves  a character  decision  accuracy  of  95.4%  (59308/62167)  with  no  rejections,  where  character 
decision  accuracy  CHAR3  is  defined  to  be: 


CHAR3  = 


A ^chrrec 
^'^char 


AC 


chrrec 

char 


+ AI 


chrrec 

char 


(26) 


Equation  (26)  has  the  same  numerator  as  Equation  (25),  but  the  denominator  represents  the  total  number  of  segmented 
character  images  presented  to  the  system’s  classifier  that  are  not  rejected.  Equation  (26)  does  not  include  character 
deletions  within  the  system.  At  a rejection  rate  of  4.6%,  the  system  achieves  a character  decision  accuracy  of  97.4% 
(57671/59217). 

The  system  achieves  a field  accuracy  of  79.1%  (10878/13748)  with  no  characters  rejected,  where  field  accu- 
racy CHRFLDl  is  defined  to  be: 


A^fldrec 

CHRFLDl  = 

totalciji-fu 

The  munerator  of  Equation  (27)  represents  the  total  number  of  fields  conecdy  recognized  by  hsfsys.  In  order  for  a field 
to  be  considered  correcdy  recognized,  no  remaining  characters  in  the  field  value  after  rejection  can  be  substituted, 
inserted,  or  deleted.  The  denominator  represents  the  total  number  of  fields  requested  to  be  recognized  by  the  system. 

The  recognition  system  achieves  a word  accuracy  of  60.5%  (15439/25532)  when  applying  a limited-size  dic- 
tionary to  the  character  classifications  made  on  the  Constitution  paragraph.  Running  the  recognition  system  using  the 
small  memory  mode  option,  h^sys  achieves  a word  accuracy  of  59.0%  (15076/25532)  with  one  half  the  training  pro- 
totypes used  by  default.  The  word  accuracy  is  computed  by  tokenizing  each  word,  using  the  Scoring  Package  to  align 
the  word  tokens,  and  then  accumulating  the  number  of  substituted,  inserted,  and  deleted  words. 

A timing  option  can  also  be  selected  when  invoking  hsfsys,  in  which  case  a timing  file  is  produced  upon  sys- 
tem completion.  Figure  41  shows  the  collective  time  spent  on  each  major  task  within  the  recognition  system.  The  times 
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recorded  in  the  riming  file  are  actually  reported  at  a much  finer  detail,  so  that  many  useful  tables  can  be  compiled  for 
conducting  various  analyses.  These  timing  results  were  achieved  on  an  SGI  Challenge  (IP  19)  computer  listed  in  Figure 
42.  Notice  that  most  of  the  time  is  spent  classifying  characters  (29.2%)  and  conducting  dictionary-based  postprocess- 
ing (26.0%).  Figure  42  lists  all  the  different  computers  on  which  the  recognition  system  was  successfully  compiled  and 
tested.  The  last  column  in  the  table  shows  the  average  user  time  required  by  each  machine  to  process  a single  form. 
Theses  averages  were  compiled  from  the  times  produced  on  the  10  HSF  form  provided  in  the  top-level  directory  data. 
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Figure  41.  Report  compiled  from  timing  statistics  generated  by  hsfsys  on  an  SGI  Challenge  (IP19). 
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1 

32  Mb 

28.3 

HP 
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1 

64  Mb 
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IBM 
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SGI 
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8 
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Sun 
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SunOS  4.1.3 

1 

64  Mb 

81.8 

Sun 

5 RCstation  10 

SunOS  4.1.3 

1 

32  Mb 

63.0 

Sun 

S;  .L<.Cstation  10 

SunOS  5.2  (Solaris) 

2 

128  Mb 

39.6 

Figure  42.  Table  of  timings  from  different  computers  on  which  the  standard  reference  recognition  system  has  been  suc- 
cessfully ported  and  tested. 

AU  computers,  including  those  with  multiple  processors,  were  compiled  and  tested  serially. 

**'nie  Sun  IPC  was  run  using  the  small  memory  mode  option  due  to  its  limited  memory. 
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11.  FINAL  COMMENTS 


A number  of  NIST  Internal  Reports  (NISTIRs)  have  been  referenced  in  this  document.  These  reports  are  pro- 
vided in  PostScript  format  in  the  top-level  directory  doc.  The  file  doclhsfsys.ps  contains  this  specific  document  These 
reports  along  with  many  other  NIST  Image  Recognition  Group  publications  are  available  in  PostScript  format  across 
the  network  via  anonymous  ftp  on  sequoyah.ncsl.nist.gov.  To  request  a paper  copy  of  any  of  these  NISTIRs,  please 
contact 


CSL  Publications 

NIST 

225/B151 

Gaithersburg,  MD  20899 
voice:  (301)  975-2821 

This  report  documents  the  NIST  standard  reference  recognition  system  hsfsys  in  terms  of  its  installation,  orga- 
nization, and  functionality.  The  system  has  been  successfully  compiled  and  tested  on  a number  of  different  vendors’ 
UNIX  workstations.  It  is  the  responsibility  of  the  distribution  recipient  to  port  the  software  to  their  specific  computer 
architecture.  The  source  code  is  written  primarily  in  C with  two  supporting  utilities  containing  FORTRAN  compo- 
nents. The  standard  reference  recognition  system  is  organized  into  11  libraries.  In  aU,  there  are  approximately  19,000 
lines  of  code  siq)portmg  more  than  550  subroutines.  Source  code  is  provided  for  a wide  variety  of  utilities  that  have 
application  to  many  other  types  of  problems. 

Distributions  of  the  NIST  standard  reference  recognition  system  can  be  obtained  free  of  charge  on  CD-ROM 
by  sending  a letter  of  request  to  the  primary  author.  Requests  for  distribution  made  by  electronic  mail  will  not  be 
accepted;  however,  electronic  mail  is  encouraged  for  technical  questions  once  the  distribution  has  been  received.  This 
system  or  any  portion  of  this  system  may  be  used  without  restrictions  because  it  was  created  with  U.S.  government 
funding.  Redistribution  of  this  standard  reference  system  is  strongly  discouraged  as  any  subsequent  corrections  or 
updates  win  be  sent  to  registered  recipients  only.  This  software  was  produced  by  NIST,  an  agency  of  the  U.S.  govern- 
ment and  by  statute  is  not  subject  to  copyright  in  the  United  States.  Recipients  of  this  software  assume  all  responsi- 
bilities associated  with  its  operation,  modification,  and  maintenance. 
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APPENDIX  A.  UTILITY  ndslevt 


The  NIST  standard  reference  recognition  system  uses  the  Karhunen  Loeve  (KL)  transform  to  extract  features 
for  classifying  segmented  character  images.  Tlis  transform  is  obtained  by  projecting  a character  image  onto  eigenvec- 
tors of  the  covariance  computed  from  a set  of  training  images.  The  mathematical  details  of  the  KL  transform  are  pro- 
vided in  Section  9.2.2.5. 

The  eigenvectors  are  computed  off-line  and  stored  in  a basis  function  file  (described  in  Section  9.2. 1.1) 
because  computing  the  eigenvectors  of  a large  matrix  is  very  expensive.  The  standard  reference  recognition  system 
hsfsys  reads  the  basis  function  file  during  its  initialization,  and  then  reuses  the  eigenvectors  across  all  the  character 
images  segmented  from  fields  of  a specified  type  (digit,  lower  case,  upper  case,  or  Constitution  box).  The  program 
mis2evt  applies  the  KL  transform  to  segmented  character  images  and  generates  a basis  function  file.  The  program’s 
main  routine  and  FORTRAN  subroutines  are  located  in  tiie  distribution  directory  srcibinimislevt.  The  command  line 
usage  of  mislevt  is  as  follows: 

• mis2evt 

Usage:  mis2evt: 

-n  for  128x128  input  write  normed+sheared  32x32  intermediate  misfiles 
-V  be  verbose  - notify  completion  of  each  image 
nrequiredevts  evtfile  mfs_of_misfiles 

Arguments: 

• The  first  argument  nrequiredevts  specifies  the  number  of  eigenvectors  to  be  written  to  the  output  file.  It  is 
also  the  number  of  KL  features  that  will  ultimately  be  extracted  from  each  binary  image  using  the  associated 
utility  mislpat.  This  integer  determines  the  dimensionality  of  the  feature  vectors  that  are  produced  for  clas- 
sification. Its  upper  bound  is  the  image  dimensionality  (which  is  32x32  = 1024).  Typically,  this  argument  is 
specified  to  be  much  smaller  than  1024  because  the  KL  transform  optimally  compacts  the  representation  of 
the  image  data  mto  its  first  few  coefficients  (features).  Hsfsys  uses  a value  of  64.  Reference  22  documents 
an  investigation  of  the  dependency  of  classification  error  on  feature  vector  dimensionality. 

• The  second  argument  evtfile  specifies  the  name  of  the  output  basis  function  file. 

• The  third  argument  mfs_ofjnisfiles  specifies  a text  file  that  lists  the  names  of  aU  the  MIS  files  containing 
images  that  will  be  used  to  calculate  the  covariance  matrix.  This  argument  is  an  MFS  file  with  the  first  line 
containing  an  mteger  indicating  the  number  of  MIS  files  that  follow.  The  remaining  lines  m the  MFS  file 
contain  MTS  file  names,  one  name  per  line. 

Options: 

• The  option  “-n”  specifies  the  storing  of  intermediate  normalized  character  images.  Mis2evt  can  process 
binary  images  that  are  either  (128  by  128)  or  (32  by  32).  In  the  case  of  the  former,  the  program  invokes  a 
size  normalization  utility  to  produce  32  by  32  images  and  then  applies  a shear  transformation  to  reduce  slant 
variations.  If  the  input  images  are  already  32  by  32,  this  flag  has  no  effect.  If  normalization  does  occur,  the 
resulting  normalized  images  are  stored  to  MIS  files  having  the  same  name  as  those  listed  in  the  MFS  file, 
with  the  additional  extension  32  appended.  These  intermediate  files  offer  computational  gains  because  usu- 
ally the  same  images  are  used  with  mis2pat. 

• The  option  “-v”  produces  messages  to  standard  output  signifying  the  completion  of  each  MIS  file  and  other 
computation  steps. 

This  program  is  computationally  expensive  and  may  require  as  long  as  60  minutes  to  compute  the  eigenvec- 
tors for  a large  set  (50,000  characters)  of  images.  The  program  mis2evt  was  used  to  generate  the  basis  function  files 
provided  with  this  distribution  in  the  top-level  directory  weights  and  endiug  with  the  extension  evt.  These  files  contain 
eigenvectors  computed  from  the  images  provided  in  the  top-level  directory  train.  The  MFS  files  used  as  arguments  to 
mis2evt  are  also  provided  in  weights  and  end  with  the  extension  ml.  For  example,  the  basis  function  file  tdlSJ.evt  was 
generated  with  the  following  command: 

• mis2evt  -v  64  tdl3_l.evt  tdl3_ljiil 


60 


APPENDIX  B.  UTILITY  mis2pat 


A second  supporting  utility  is  provide  with  this  distribution.  Mislpat  takes  a set  of  training  images  along  with 
the  eigenvectors  generated  by  mislevt  and  creates  feature  vectors  that  can  be  used  as  prototypes  for  training  classifiers 
(in  this  case  PNN).  Typically,  the  same  images  used  to  compute  the  eigenvectors  are  used  here  to  generate  prototype 
vectors.  The  program  mis2pat  also  builds  a kd-tree  as  described  in  Section  9.22.6.  The  prototypes  along  with  their 
class  assignments  and  kd-tree  are  stored  in  a prototype  file  (described  in  Section  9.2. 1.2).  In  addition,  mis2pat  com- 
putes median  vectors  from  the  prototype  vectors  and  stores  them  in  a median  vector  file.  The  program’s  main  routine 
and  FORTRAN  subroutines  are  located  in  srclbinlmis2pat.  The  command  line  usage  of  mis2pat  is  as  follows: 

# mis2pat 
Usage:  mis2pat: 

-h  accept  hexadecimal  class  files 

-n  with  128x128  images  write  nonned+sheared  32x32  intermediate  misfiles 
-V  be  verbose  - notify  completion  of  each  image 
classset  evtfile  outfile  mfs_of_clsfiles  mfs_of_misfiles 


Arguments: 

• The  first  argument  classset  specifies  the  name  of  a text  file  (MFS  file)  containing  the  labels  assigned  to  each 
class.  The  integer  on  the  first  line  of  the  file  indicates  the  number  of  classes  following,  and  the  remaining 
lines  contains  one  class  label  per  line.  For  example,  a digit  classifier  uses  ten  classes  labeled  0 through  9. 

• The  second  argument  ev^le  specifies  the  basis  function  file  containing  eigenvectors  computed  by  mis2evt. 
The  number  of  features  in  each  output  vector  is  determined  by  the  number  of  eigenvectors  in  this  file. 

• The  third  argument  ou^le  specifies  the  name  of  the  output  prototype  file.  The  name  of  the  output  median 
vector  file  is  the  same  except  with  a second  extension  med  appended  to  the  ou0e  argument. 

• The  final  arguments  are  the  names  of  text  files  (MFS  files)  that  contain  listings  of  file  names.  The  argument 
mfs_of_clsfiles  lists  file  names  containing  class  assignments  corresponding  to  the  images  m the  MIS  files 
listed  in  the  argument  mfs_of_misfiles.  Each  class  assignment  file  must  have  the  same  number  of  class 
assignments  as  there  are  images  in  its  corresponding  MIS  file,  and  the  classes  assigned  must  be  consistent 
with  those  listed  in  the  argument  classset. 

Options: 

• The  option  “-h”  specifies  that  the  class  labels  listed  in  the  classset  file  are  to  be  converted  to  ASCH  charac- 
ters values  represented  in  hexadecimal.  All  the  class  assignments  in  the  files  listed  in  the  argument  mfs_of_- 
clsfiles  use  the  convention  where  [30-39]  represent  digits,  [41-5a]  represent  upper  case,  and  [61-7a] 
represent  lower  case.  If  the  classset  file  contains  alphabetic  representations  such  as  [0-9],  [A-Z],  and  [a-z], 
then  this  flag  must  be  used  to  effect  conversion  of  these  labels  to  their  hexadecimal  equivalents. 

• The  option  “-n”  specifies  the  staring  of  intermediate  ncnmalized  character  images.  Mis2pat  can  process 
binary  images  that  are  either  (128  by  128)  or  (32  by  32).  In  the  case  of  the  former,  the  program  invokes  size 
and  slant  normalization  utilities  to  produce  32  by  32  images.  If  the  input  images  are  aheady  32  by  32,  this 
flag  has  no  effect.  If  normalization  does  occur,  the  resulting  normalized  images  are  stored  to  MIS  files  hav- 
ing the  same  name  as  those  listed  in  mfs_of_misfiles,  with  the  extension  32  appended. 

• The  option  “-v”  produces  messages  to  standard  ouq)ut  signifying  the  completion  of  each  MIS  file. 

This  program  was  used  to  generate  the  prototype  files  provided  with  this  distribution  in  the  top-level  directory 
weights  and  ending  with  the  extension  pat.  These  files  contain  KL  feature  vectors,  their  associated  classes,  and  a kd- 
tree  as  described  in  Section  9.2. 1.2.  The  feature  vectors  were  computed  using  the  eigenvectors  found  in  the  same  direc- 
tory and  from  the  images  provided  in  the  top-level  directory  train.  The  MFS  files  used  as  arguments  to  mis2pat  are 
also  provided  in  weights,  as  are  the  classset  files  which  end  with  the  extension  set.  The  class  assignment  files  are  listed 
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in  files  ending  with  the  extension  cl,  whereas  the  MES  files  are  listed  in  files  ending  with  the  extension  ml.  For  example, 
the  prototype  file  tdl3_l.pat  was  generated  with  the  following  command: 

# mis2pat  -vh  l.set  tdl3_l.evt  tdl3_l.pat  tdl3_l.cl  tdl3_Lml 

# mv  tdl3_l.patmed  tdl3_l.med 

The  second  command  renames  the  generated  median  vector  file  from  its  default  name  tdl3_l.pat.med  to  the 
distribution  name  tdl3  l.med. 
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APPENDIX  C.  2nd  Census  OCR  Systems  Conference 


In  February  of  1994,  the  Second  Census  Optical  Character  Recognition  Systems  Conference  was  sponsored 
by  the  Bureau  of  the  Census  and  run  by  NIST.  Ten  different  organizations  submitted  recognition  results  to  NIST  for 
scoring.  The  task  of  the  conference  was  to  read,  via  machine,  a small  handprinted  portion  of  the  1990  Census  Long 
Form.  This  part  of  the  form  contains  three  questions  related  to  occupation.  Three  rectangular  regions  were  provided 
on  the  form  in  which  people  were  instructed  to  write  their  answers.  Images  were  obtained  from  both  microfilm  and 
paper,  and  the  images  were  cropped  creating  miniforms  containing  just  the  test  questions.  The  details  of  the  conference 
and  the  conclusions  drawn  from  the  results  are  presented  in  Reference  12.  This  is  the  most  comprehensive  test  of  rec- 
ognition systems  of  this  type  done  to  date. 

The  NIST  standard  reference  optical  character  recognition  system  is  similar  to  the  NIST  system  used  in  the 
conference.  Both  systems  use  connected  components  for  segmentation;  they  use  the  same  size  and  slant-normalization 
techniques;  they  both  use  the  KL  transform  to  compute  feature  vectors;  and  a PNN  classifier  is  used  in  both  systems. 

Despite  their  similarities,  there  are  some  significant  differences  between  the  standard  reference  recognition 
system  and  the  conference  system.  The  standard  reference  recognition  system  conducts  form  removal  prior  to  ccmduct- 
ing  field  isolation,  whereas  the  conference  system  simply  registered  the  image,  extracted  the  fields,  and  removed  form 
artifacts.  Systems  that  conducted  some  type  of  form  removal  in  the  conference  achieved  better  results  than  those  that 
did  not.  Also,  the  dictionary-based  postprocessing  used  in  the  standard  reference  recognition  system  is  substantially 
different  than  that  used  in  the  conference  system.  The  standard  reference  recognition  system  uses  the  word-based  pro- 
cess described  in  Section  9.5 . 1 .3  to  process  the  handprint  written  in  the  Constitution  box  of  Handwriting  S ample  Forms 
(HSF  forms).  The  conference  system  used  phrase-based  dictionary  matching  where  the  system  corrected  errors  using 
lists  of  multiple-word  phrases  rather  than  using  single-word  dictionaries. 

The  apphcation  of  these  two  systems  is  also  significantly  different.  First,  the  image  quality  of  the  HSF  forms 
distributed  with  SDl  and  SD3  is  better  than  the  quality  of  images  scanned  from  the  Census  Long  Forms.  Second,  the 
standard  reference  recognition  system  uses  a limited-size  dictionary  (38  words)  when  processing  the  Constitution  box. 
This  is  in  contrast  to  the  conference  where  dictionaries  of  more  than  60,000  multiple-word  phrases  were  used. 
Although  the  dictionary  is  restricted  when  reading  the  Constitution  box,  the  segmentation  of  the  characters  is  compli- 
cated. The  segmentation  of  this  multiple-line  text  paragraph  requires  an  elaborate  solution,  such  as  the  sequence  recon- 
struction described  in  Section  9.5. 1.2.  This  is  considerably  more  difficult  than  segmenting  the  single-line  (occasionally 
more  than  one  line)  fields  on  the  conference  miniforms. 

It  is  difficult  to  compare  the  performance  of  the  standard  reference  recognition  system  to  the  conference  sys- 
tem due  to  the  differences  between  the  two  systems  and  their  applications.  However,  some  comparisons  can  be  made 
at  the  word  recognition  level.  The  word  accuracy  of  the  standard  reference  recognition  system  was  61%  on  the  Con- 
stitution box.  The  average  field  in  the  conference  contained  two  words  so  that  this  level  of  accuracy,  if  sustained  on 
the  conference  test,  would  have  resulted  in  a 37%  field  accuracy  rate.  In  the  conference,  NIST  achieved  a 25%  field 
accuracy.  Based  on  these  numbers  it  is  probable  that  the  standard  reference  recognition  system  is  better  than  the  con- 
ference system. 

The  expected  word  accuracy  for  the  best  conference  system  was  about  76%,  so  on  a word  basis  we  would 
expect  the  the  standard  reference  recognition  system  to  have  about  15%  (76%-61%)  more  errors  than  the  best  confer- 
ence system.  This  is  comparable  to  the  median  system  reported  at  the  conference.  The  difference  between  the  best  con- 
ference systems  and  the  NIST  standard  reference  recognition  system  is  primarily  due  to  segmentation.  Unhke  the  best 
conference  systems,  the  standard  reference  recognition  system  does  not  use  any  techniques  for  oversegmenting  char- 
acters and  reconstructing  words. 
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